• Search

GeoGPT: Concerns remain

With the continued potential for censorship and a lack of transparency, Paul Cleverley flags persistent concerns around AI in geoscience.

4 April 2025

Image by Gerd Altmann from Pixabay

Dear Editors,

I read with interest the recent helpful update on GeoGPT (Ludden et al., 2025), as well as the related ‘Frequently Asked Questions’ (GeoGPT, 2025). The changes made so far are welcomed but concerns remain.

The creation of a separate international-facing version of the GeoGPT application on servers in Singapore is a positive step. However, the Terms and Conditions for its use still contain clauses prohibiting geoscientists from asking questions that may “tarnish the image of the [Chinese] state” (GeoGPT, 2025). Such a clause seems out of place for AI hosted outside the legal jurisdiction of the People’s Republic of China (PRC) and has the potential for censorship and the erosion of freedom of speech, particularly for the international community researching social geoscience topics.

The future release of several geoscience-specific Large Language Models (LLM) used by the GeoGPT application (GeoGPT, 2025) is another positive step. These models, together with existing geoscience-specific models, can be compared to the latest generic foundation LLMs, that also have significant geological understanding. As stated by Li (2023), “These [generic] models have been shown to outperform models fine-tuned with domain-specific data on some tasks”. This presents an interesting area for independent research.

The geoscience-specific LLM GeoGalactica, published in December 2023, is acknowledged to have extensively infringed copyright (Cleverley et al., 2024; Ludden et al., 2025). While the Deep-time Digital Earth (DDE) team now distance themselves from GeoGalactica, the public record implies a significant level of involvement and crossover. For example, a 2024 article published by the GeoGalactica team and co-authored by several senior members of DDE states that GeoGalactica was jointly developed by “Shanghai Jiao Tong University and the Deep-time Digital Earth Science Centre” (Lin et al., 2024). Leaders of DDE have worked with Shanghai Jiao Tong University and this research team since at least 2021, including visiting the university in 2023 to “guide the special work of the Deep-time Digital Earth International Science Project” and work on the “excavation and utilization of deep geoscience data” to construct LLMs in the geosciences (Jianbo, 2023). Much of this jointly developed technology is still available on the DDE online platform.

Concerns about copyright infringement extend to other DDE technologies; the Geoscience Academic Knowledge Graph (GAKG) and GeoGPT (as discussed in Cleverley et al., 2024). When these issues were first raised in early 2024, DDE removed GAKG from their online platform and restricted access to GeoGPT. However, these repeat alleged copyright infringements raise questions about the thoroughness of DDE reviews – checks on copyright should have been in place before the technologies were made available.

GeoGPT is promoted as ‘open science’ and ‘open source’ and, following calls from geological societies worldwide, DDE committed to openly releasing their training data (Hawkins, 2024). Yet, it now appears that DDE will disclose only the training sources (publisher and journal list) rather than the actual training data (GeoGPT, 2025). It seems that none of the user-facing software code, the underlying knowledge graph (GeoKG), the databases used for Retrieval Augmented Generation (RAG), nor the 100,000 geological question-answer pairs used for LLM training are openly available (GeoGPT, 2025). Without transparency at the individual article level, users cannot assess what geoscientific biases may exist. Geoscientists will not be able to recreate or openly release their own derivatives of GeoGPT to advance science or cater for their own data privacy and security, which hampers international cooperation. Ultimately, GeoGPT is not reproducible. Without this cornerstone of scientific integrity, GeoGPT should not be marketed as open science or open source.

As is common for online-hosted AI tools, GeoGPT appears to have Terms and Conditions that give the proprietor legal rights to use data uploaded by users for the proprietor’s own purposes. I question the ethics of offering free AI capabilities, particularly to geoscientists in the Global South, in exchange for rights to their data, as promoted by the International Union of Geological Sciences (IUGS)-endorsed DDE.

Given these ongoing concerns around use, transparency, and copyright, I argue that, rather than endorsing GeoGPT or any AI application or cyberinfrastructure specifically, the IUGS should take a more impartial role. As an international non-governmental organisation that aims to unite and promote global cooperation in geoscience, IUGS could advocate for fully open science and open-source technologies, thereby encouraging wider agency, equity, innovation and international cooperation.

Author

Prof Paul H Cleverley FGS, Geologist and Computer Scientist, and Visiting Professor of Information Science & Technology at Robert Gordon University, Aberdeen, Scotland

Further Reading

This letter is a response to the column ‘GeoGPT: An Update, by Ludden and colleagues, available here: https://geoscientist.online/sections/viewpoint/geogpt-an-update/

 

NOTE 8th April 2025: Subsequent to the closed session of the 81st International Union of Geological Sciences (IUGS) Executive Committee meeting in late March, the IUGS published a letter on the 7th April withdrawing their support for Deep-time Digital Earth (DDE) for one year. A range of matters are to be resolved including the relationship of GeoGPT to DDE. The letter is available here:

https://www.linkedin.com/posts/hassina-mouri-2442a812_iugs-decision-re-dde-program-activity-7314990963196620803-BNTx?utm_source=share&utm_medium=member_ios&rcm=ACoAAAJdxjgBSnNfsxghi8atnlNooAgz4mP6AtE

Related articles