• Search
  • LinkedIn
  • Instagram

Advancing transparent and ethical AI

Paul Cleverley, Silvia Peppoloni, Chuck Bailey and Simon Thompson discuss ongoing concerns around geoscience AI and the need for transparency

10 October 2024

Image by Tung Lam from Pixabay

The development of Artificial Intelligence (AI) in the geosciences continues at pace. There is much to be positive about, but with AI comes a responsibility to ensure these powerful emerging tools are accurate, safe, and adhere to scientific and ethical principles. Recent discussions have highlighted concerns around GeoGPT, a geoscience chatbot being developed by the Deep-time Digital Earth (DDE) project, which is recognised by the International Union of Geological Sciences (IUGS) (e.g. Cleverley, 2024; Stephenson et al., 2024a; Hawkins, 2024). While DDE are working to address some of these issues (Stephenson et al., 2024b), serious concerns around transparency, quality, ethics, state censorship, and safety remain – concerns that must be addressed if GeoGPT is to be endorsed by an international scientific non-governmental and non-political body.

Accelerating AI

Big data and AI techniques have been utilised in geoscience for at least the past decade. Transformer models, a type of deep-learning architecture, have been used since 2019 and various Large Language Models (LLMs) have been fine-tuned using over 20 million geoscience documents (Denli et al., 2021). Many organisations have created LLMs to aid information discovery for the geoscience community or apply LLMs to their own content to extract, classify, summarise and answer geoscience questions using text and images.

For example, in June 2024 NASA openly released their INDUS Earth science LLM models (Koehl, 2024), which build on work from years earlier and were trained using open-access geoscience literature including from the American Geophysical Union (AGU); LinkedEarth, a community for the palaeosciences (www.linked.earth), has recently run workshops to build platforms (systems that allow software to run) and open-source tools that can analyse palaeoclimate data in an open and reproducible way; the International Geoscience and Geoparks Programme (IGCP) of UNESCO have activated a two-year project to exploit a cloud platform (GECO, 2024) to analyse geoscience data and literature for locations in East Africa; and recent research used LLMs to generate the top 100 questions for geoscience and Earth science (Hatvani et al., 2024).

Heightened community use of and interest in AI are reflected in the rising number of meetings on these topics. For example,  the University of Exeter held the first ‘AI for Geological Modelling and Mapping’ conference in May 2024 and the Geological Society of London will host its second Digital Geoscience conference ‘Intelligent Solutions in Geoscience’ at Burlington House in October 2024.

With GeoGPT, DDE aims to contribute another AI tool for the geoscience community. However, GeoGPT can only be useful if the existing serious issues are addressed, and therefore significantly more work is currently required.

Ongoing issues

In advertorials, DDE describes GeoGPT as ‘open source’ and ‘open science’ (DDE, 2024), but they have not released the training data or any models in open repositories, and aspects of GeoGPT are proprietary. At an IUGS-hosted meeting in July (IUGS, 2024), which included attendees from DDE and geological organisations across the world, participants asked DDE to release GeoGPT training data at the article level. DDE has yet to make any firm commitment to release the data.

Transparency and quality go together. For geoscientists to be confident of quality they need to be able to identify and scrutinise sources of information (there is a difference in the efficacy of an AI tool trained on high-quality peer-reviewed articles and structured data compared to one heavily influenced by Wikipedia and other sources, for example). Without releasing the training data and models for GeoGPT, it is impossible to assess biases (Floridi, 2023; Sun, 2024) – such as the influence of non-peer-reviewed content – in the AI-generated outputs.

GeoGPT has reportedly been modified to adopt Retrieval Augmented Generation (RAG) (IUGS, 2024), a welcome change in architecture that means the chatbot’s answers now include citations. However, the lack of transparency around source material makes it difficult for geoscientists to ensure they are credited for their work, and that it is conveyed and interpreted correctly, or for content owners to check that their materials are being used in compliance with their rights, the rights of authors and third-parties, and with Intellectual Property (IP) law.

Indeed, the Terms and Conditions (T&Cs) for GeoGPT state, “we do not guarantee that the content of GeoGPT will be… free from any infringement” (GeoGPT, 2024), implying that proprietary content for which DDE does not have IP rights may have been used to train the models. The DDE team acknowledge uncovering an example of copyright abuse by one research group (Stephenson et al., 2024a), while our own testing of an early version of GeoGPT strongly suggests that the model may have been trained on paywalled content (with GeoGPT providing answers to incredibly specific geological questions that appear only to be available in two paywalled articles). Any copyright breach could expose geoscientists to IP violations were they to unintentionally use or distribute infringing content. Malami Uba Saidu, President of the Geological Society of Nigeria stated, “DDE’s GeoGPT project’s potential breach of UNESCO core values and international copyright law raises serious concerns. The establishment of governance structures that ensure ethical AI, is crucial.”

DDE’s relationship with the Chinese government, which provides its funding and within which state the DDE research centres are primarily located, gives rise to ethical questions. For example, GeoGPT’s T&Cs state that users are “prohibited from inputting or inducing GeoGPT LLM to generate [content that] … incites subversion of state [the People’s Republic of China] power, jeopardizes the security and interests of the state, tarnishes the image of the state” (GeoGPT, 2024). It is critical to know how these self-imposed constraints will skew or bias the information GeoGPT provides to users.

AI ethics experts agree that GeoGPT’s T&Cs are broad and open to interpretation, with Professor Jon Truby (a country representative in the ‘Global South’ on the UNESCO AI Ethics Committee) stating, “A scientific chatbot prohibiting certain questions because they may tarnish the image of a government, poses a serious risk to academic freedom of expression.

To address concerns around potential state censorship, DDE has now added the LLM models Llama and Mistral into GeoGPT. However, the latest version of GeoGPT (as of July 2024) still appears to have the option for international users to employ Alibaba’s “Qwen” LLM. Whilst Llama is built by a US private company (Meta) and Mistral by a French private company, they are not intentionally programmed to be uncritical of the US or French governments. On the other hand, Alibaba’s “Qwen” is intentionally restricted to be uncritical of the Chinese Communist Party (Lin, 2024). Such restrictions can impact the accuracy and completeness of information, which compromises quality and raises ethical concerns.

DDE’s original vision for its online platform was one that links databases together (e.g., Stephenson, 2023). But this vision doesn’t tally with how AI and LLMs, like GeoGPT, work – they do not link data, they harvest it. And while AI models are often pitched as free tools, the real cost is your data. To take advantage of GeoGPT’s capabilities (such as extraction, geo-referencing, summarisation), DDE encourages geoscientists and organisations to upload their own documents and maps. But there is a lack of clarity around how users’ data – both scientific and personal – will be utilised. Data searches and enquiries made by users can reveal a lot about their plans, activities and proprietary and/or commercially sensitive information (which can have commercial and security implications). Transparency and governance are essential to ensure that GeoGPT users can be certain who monitors their activity and for what purposes, and have confidence in standards of privacy and security.

Finally, while GeoGPT is formally described as not having been released and not in the public domain (e.g. Stephenson et al., 2024a,b; Hawkins, 2024), it is nevertheless already being used by geoscientists, in particular by the oil and gas industry in China, through a Joint Venture funded by Chinese National Oil Companies, CNPC, CNOOC and SINOPEC (NEPU, 2024).

IUGS endorsement

Many in the geoscience community envision an important role for IUGS in helping to set ethical standards for AI. However, the relationship between the IUGS and DDE, and IUGS’s participation in the marketplace through GeoGPT, makes this complicated and raises ethical challenges.

Whilst commercial entities may use competitive advantage as cause to be opaque, a model endorsed by IUGS should set high standards in transparency. It is in the interest of science and ethics to allow biases within the data to be assessed before GeoGPT is released (which will also maximise usage and trust in the scientific community).

IUGS is currently conducting a review of DDE, which we welcome, but improvements in transparency should be made immediately. We also welcome the ongoing discussion by members of IUGS about its future role in helping to shape ethical principles in the use of AI, and about whether it should adopt a more arms-length approach to participation in the field.

Maximise potential

AI presents a huge opportunity for geoscience. It is the responsibility of those of us with voice in the community to ensure we tackle problems early to maximise its chances of success and to mitigate unintended consequences.

Authors

Prof Paul H Cleverley FGS, Geologist and Computer Scientist, and Visiting Professor of Information Science & Technology at Robert Gordon University, Aberdeen, Scotland.

Dr Silvia Peppoloni, President, International Association for Promoting Geoethics; Istituto Nazionale di Geofisica e Vulcanologia, Rome, Italy.

Prof Christopher ‘Chuck’ M. Bailey, Acting President, Geological Society of America; Chair, Department of Geology, William & Mary, Virginia, USA.

Simon Thompson, Chief Executive, Geological Society of London, UK.

Further reading 

Related articles