AI and historical data access

© Alamy
Dear Editors,
I read the recent debates on AI in geoscience and GeoGPT (Geoscientist, Autumn 2024, Winter 2024) with great interest.
The issue of copyright and clear citations to data sources is critical. Another vitally important aspect is access to historical work and a clear understanding of what is, or is not copyright– both for individual use and for training AI tools.
One example is UK colonial survey data. Much excellent geological work was done by assorted versions of the “Overseas Service”, particularly in Commonwealth countries, and published to a high level of professionalism in Annual or Area Reports, often complete with maps. Another example is geological work sponsored by the United Nations after the end of the colonial period. While some original data may have been lost to fire, flooding, or termites, such reports were distributed to libraries and organisations worldwide.
Some organisations, such as the Geological Society, offer a fantastic service, supplying scans of historical documents via post. However, often these documents have already been digitally scanned somewhere in the world yet, frustratingly, only snippets of the reports are available through online search engines and digital libraries (such as Google or the HathiTrust). Else, online access is restricted and the total document can only be viewed from a specific country, despite being listed in the National Archives as an open document.
Many geological surveys (including the British Geological Survey, United States Geological Survey, Geological Survey of Brazil, and Indonesia’s Ministry of Energy and Mineral Resources) have downloadable reports and maps. But these are often difficult to find (even searching the BGS website for downloadable reports on a specific country or area is not easy, though they have some great data) and the reports rarely appear in search engine results.
My understanding is that 50 years after publication, work done by UK government employees is no longer under copyright, so much of the colonial legacy should be publicly and more importantly easily available. The relevant authorities should clearly state the copyright associated with colonial and post-colonial era reports, and ideally make such documents freely available and searchable via online search engines and digital libraries. The public could also be encouraged to upload data that are out of copyright. Such an approach would help researchers and improve AI tools. Ensuring that tools such as GeoGPT can access relevant non-copyright material as part of their training data seems like an important step in making fantastic historical work more widely available.
Dr Linda Heesterman PhD, FGS
Geological Exploration Consultant and Researcher, UK