AGU Fall Meeting 2023 Abstract

From geokb

The following is an abstract submitted for an invited talk on the Geoscience Knowledgebase at the AGU Fall Meeting in 2023.

Building the USGS Geoscience Knowledgebase

The U.S. Geological Survey has been in existence since 1879. Over that time and at an increasing rate, we have produced millions of pieces of data, information, and knowledge. Nothing connects all these together, and nothing conveys everything that we know about earth's systems in a way that links across disciplines. The Geoscience Knowledgebase (GeoKB) is being designed to provide USGS with the new kind of digital scientific instrument we need to conduct our mission and share what we know with the world in a way that helps both human and artificial intelligence apply what we know.

In 2006, we started developing ScienceBase as a place to address both immediate digital infrastructure needs of scientific projects and start building an all-encompassing scientific database where everything was connected. We patterned the basic model on Freebase, which would eventually lead toward the Google Knowledge Graph. ScienceBase never developed into what we hoped in terms of the connectivity between everything, but it has managed to provide a digital representation of over 17.5 million items; everything from documentation of physical samples to discrete datasets.

Our current approach leverages Wikibase. It is like the ScienceBase concept where everything is an item defined by its documented characteristics, but it is now based in the fundamentals of RDF as opposed to a bespoke, limited metadata structure. Of note is the notion that statements are assertions that are only as useful as the evidence presented and that a given item may have multiple competing claims about the same property. This is the reality on the ground with any data integration exercise, but we rarely capture the judgment calls made in practice in a usable form. The Wikibase model lets us record and organize all claims simultaneously, leaving it up to the inquirer to decide what suits their need.

Knowledge encoded in its most fundamental graph form is vital as AI continues to mature. Large Language Models need external brains. We want AI to guess about things that we ourselves are still conjecturing and not waste energy on things that just need to be encoded more efficiently. We also need a platform where we record judgements about what claims we trust in what circumstances from both humans and AI.