• Search
  • LinkedIn
  • Instagram

Geoscience AI in crisis?

Paul Cleverley raises concerns about big-data artificial intelligence projects in the geosciences

17 June 2024

(Credit: Gerd Altmann from Pixabay)

Big data and artificial intelligence could revolutionise the geosciences. Research in this field is rapidly accelerating with programmes such as Deep-time Digital Earth (DDE; www.ddeworld.org) – the first big-data project recognised by the International Union of Geological Sciences – investigating how we can harness the power of large, complex datasets to better understand our planet and its resources (e.g., Stephenson 20192024), and ultimately assist in achieving the UN’s Sustainable Development Goals.

To fully realise the potential of big data requires the use of artificial intelligence (AI). Over the past year or so, DDE developed a geoscience chatbot “GeoGPT” that provides AI-generated answers similar to ChatGPT. While not yet publicly available, GeoGPT is billed as one of DDE’s core public services (e.g. www.ddeworld.org/events/detail/208; www.ddeworld.org/events/detail/201), and has been promoted and discussed at international conferences including the European Geosciences Union General Assembly in April. In early 2024, GeoGPT was made available for testing. However, my testing revealed serious issues around a lack of transparency, state censorship, and potential copyright infringement – concerns that I have raised with the DDE team and that I feel require wider discussion to ensure that any future generative AI tool created by and for our global community aligns with the UNESCO’s recommendations on the ethics of AI (UNESCO, 2022).

The AI Large Language Models (LLM) that underpin chatbots are trained by data mining from millions of research articles, potentially including copyrighted materials. For GeoGPT, it is unclear what checks DDE has in place to detect illegal data on its platform and to ensure that the copyright of numerous individuals and organisations, including geological societies worldwide is not infringed through its AI.

Like some other chatbots, GeoGPT does not currently attribute the source or authors for its answers. To not recognise the research contribution of geoscientists is unethical. Any geoscientific chatbot must provide links in its AI-generated answers to the original research for transparency and trust, and this can be achieved using a technique called Retrieval Augmented Generation, which enables the citation of sources at the end of every AI-generated assertion. This would also help minimise hallucinations – false or misleading AI responses caused by factors including biased or insufficient training data, or incorrect assumptions.

In the spirit of transparency and open science, and in line with the FAIR (findable, accessible, interoperable, and reusable) principles for scientific data, it is critical that the training data underlying AI models are released to facilitate reproducibility.

Currently, the underlying foundation LLM for GeoGPT is Alibaba’s Tongyi Qianwen “Qwen” (which can be tested online at https://huggingface.co/spaces/Qwen/Qwen-72B-Chat-Demo and comparisons drawn with ChatGPT at https://chatgpt.com/). During my testing, I found that the answers to some of my geoscience questions were state censored, which wasn’t always made clear. DDE is an international consortium with a governing body that includes geologists from across the globe, and aims to serve an international audience, yet virtually all DDE technology is developed, hosted and funded by sources in China (www.ddeworld.org/news/detail/134), and the China Academy of Sciences cites DDE as critical for enhancing China’s capabilities of detecting and securing global resources (Chen Jun, 2020). Ethically, I argue strongly that any tool developed in the name of and for the international geological community should never be based on AI that could be subject to any government censorship.

While the overarching goal of DDE to “share global geoscience knowledge” is noble, the GeoGPT project risks breaching UNESCO core values of ethical AI and international copyright law. In my opinion, DDE’s approach to GeoGPT requires radical change and a complete redesign, including the emplacement of governance structures that can deliver ethical AI to the international geoscience community. However, the project has identified some critically important issues surrounding the use of AI and the lessons learnt must now be applied to any future AI projects in the geoscientific community.

We all wish to realise the transformative benefits of geoscience AI to society, however if it is not done ethically, we will lose trust from the very people we are trying to help.

Author

Professor Paul H Cleverley FGS

Dr Paul H Cleverley FGS is a Geologist and Computer Scientist by background. He has worked in Big Data and Digital Geoscience for over 30 years. He holds a number of voluntary positions including Visiting Professor of Information Science & Technology at Robert Gordon University in Aberdeen, Scotland.

Further reading

 

The DDE reply:

Geoscience AI: opportunities and risks

We’d like to thank Paul Cleverley for his interest in the Deep-time Digital Earth project (DDE). As Paul says, DDE is an International Union of Geological Sciences programme – the first of its kind – to improve access to data and computing capability across the world, but also increasingly focused on the Global South. DDE already has hundreds of users and around thirty Task and Working Groups researching wide fields from palaeogeography to biostratigraphy to dinosaurs.

DDE works closely with UNESCO in implementing UNESCO’s Open Science Recommendation (UNESCO, 2021, 2022), and with publishers such as Springer Nature, with which DDE signed a memorandum of understanding this year.

Perhaps the most successful and clear manifestation of DDE’s goals in open science is the DDE Platform (https://deep-time.org/). This online geoscience computer lab, which is accessible and usable even with a simple laptop, enables individual scientists, research groups and others to build their own models, upload and download data, and create excellent visuals and images. DDE’s Platform has already facilitated some impressive scientific outputs, including  more than 200 peer-reviewed articles in journals such as Science and PNAS (e.g. Fan et al., 2020).

Attendance by DDE team members at recent conferences in Namibia and Nigeria to demonstrate the capabilities of the DDE platform showed the possible value of the Platform to the Global South. Research projects initiated in the last few months by Nigerian students and early career researchers reflect the bottom-up approach of the Platform, which enables and gives agency to researchers that wouldn’t otherwise be able to work with sophisticated computing.

The DDE Platform has strict rules on copyright observance for its users. A recent regrettable example of copyright abuse by a research group using the platform was quickly uncovered and the group barred from using the Platform. DDE is trying to democratise access to geoscience information and computing capability but there will be problems like this on the way.

DDE is also developing the first geoscience Large Language Model (LLM), called ‘GeoGPT’. We use both Alibaba’s Tongyi Qianwen “Qwen” and Meta’s Llama as the underlying foundation LLM for GeoGPT. A few of us have tested GeoGPT and found it astonishing, being able to answer difficult questions, sift through data, and create graphs and charts from text. None of us have noted ‘state censorship’ and this seems unlikely in a system based entirely in geoscience information. An exciting new direction involves using LLMs in identification keys in palaeontology to develop ‘professional’ AI tools that don’t ‘hallucinate’ (make mistakes) and allow a user to be guided to a palaeontological determination.

Paul’s access to the system was many months ago and so his view is not current. Problems with GeoGPT have been largely solved but the team will be working to improve the system even more. It must be stressed that at present GeoGPT has not been released and is not in the public domain. We will discuss the huge opportunities offered by LLMs, including GeoGPT, at a meeting on geological LLMs in July 2024. Organised by IUGS, the meeting will take place at Burlington House, and the DDE/GeoGPT team will be present.

It’s undeniable that LLMs in any specialist area are fraught with difficulties. LLM developers in every scientific and technical field are grappling with this challenge right now. But this is not a challenge that geoscience can duck.

Finally to funding. DDE has had strong Chinese input since its launch in 2019 under the auspices of the then Chinese IUGS President Qiuming Chen. This support continues to this day, but DDE is actively looking for more funding. The article that Paul refers to is a from a local magazine, ‘The Masses’, which reports progress in economy, science and technology related to Jiangsu Province. The main point of the article was to encourage Chinese scientists to get involved in international science programmes. The publication does not represent the views of DDE or the Chinese Academy of Sciences, and is purely the opinion of the author.

DDE is doing its best to be clear and transparent. Its statutes, midterm plans, longer-term strategies, intellectual property (IP) policies, and minutes of meetings are online (https://www.ddeworld.org/about/archives). However with a rapidly developing movement such as DDE, we recognise that there are challenges in communication and governance that we are working to solve.

We need to work together to solve hard problems, yet science is suffering from geopolitics and, unfortunately, we live in what many commentators now describe as a ‘low trust’ world (Stephenson, 2024). DDE recognises it needs to build trust and we’ll do this by improving governance, but most of all by making sure that the DDE Platform serves its users well as they grow in number from thousands to tens of thousands across the world.

We recognise that ‘open data’ is more easily said than done, but as the main force in geological open science we hope to bring people along with us. Watch this space!

Authors

Prof. Mike Stephenson, Past President DDE

Prof. Hans Thybo, Chair of the DDE Science Committee and President of the International Lithosphere Program

Prof. Chengshan Wang, President of DDE Executive Committee, and member of the China Academy of Sciences

Dr Ishwaran Natarajan, Head of International Relations, DDE

Further reading


UPDATE (July 2024):

The IUGS-sponsored meeting on Large Language Models in the Geological Sciences took place on 16 July 2024. A summary of the meeting and the latest updates to GeoGPT are available at www.iugs.org

Related articles