Main Page
The Geoscience Knowledgebase is an experimental effort to encode all information and knowledge from the U.S. Geological Survey (USGS) earth systems science portfolio. The motivating concept is to have a domain- and institution-specific knowledge graph sitting online and accessible to anyone, adjacent to Wikidata but separate in that it is maintained by a group with vested interest in building and using it in practice. As a government science institution, we have our own requirements on how we need to manage our information, and we likely don't want to have us government folks messing about directly in the "Global Knowledge Commons." Rather, our information is donated into the public domain for anyone else to do what they like. Read more about motivation and background on the project in the About page.
Entities
Entities are the primary subjects and objects in the knowledgebase. Read specifics about GeoKB Entities and view example queries.
Properties
Properties in the Wikibase instance are what connects subjects (entities) and objects (other objects or different kinds of values). We sometimes have the need to query for properties themselves and can do that via SPARQL. You can also browse the list of all properties.
SELECT ?property ?propertyLabel ?propertyDescription ?property_type WHERE {
?property a wikibase:Property .
?property wikibase:propertyType ?property_type .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Wiki Pages
Wikibase is a set of extensions on the core Mediawiki platform along with some extra components like WQDS, which is based in Blazegraph, to provide a SPARQL query service. We are experimenting with some of the functionality that being a Mediawiki instance affords through the item and property discussion or "talk" pages.
For entities that have robust source material from some other data or information system, we store the raw content used to generate items as JSON on item talk pages. This provides both a reference source that can be scrutinized and built upon as well as additional content that is either not suitable for organization into the claims/statements structure or has not yet been modeled into the knowledge graph (e.g., full abstracts for publications). This content can be read programmatically for further processing and also contributes to the full text index in the Mediawiki instance.
- Example person entity talk page containing JSON data objects for a USGS Staff Profile (scraped and transformed to schema.org/Person), ORCID source as JSON-LD, and OpenAlex (source for "addresses subject" claims).
- Example publication entity talk page for a journal article containing the schema.org/Article transformation from the USGS Publications Warehouse and OpenAlex source material.
- Search for "biodiversity and climate change" in the item talk namespace.
- See a listing of all items with item talk wiki pages containing content. This includes pages we use as a cache of source data as well as discussions on or documentation of the entity itself.
Linking to other knowledge systems
The whole point of what we're doing here in the GeoKB is to develop organized knowledge about the complex earth system and its interconnections as it touches on the domains of science covered by the USGS and our partners. As such, the links we develop between this system and other systems are perhaps the most vital component. We are not attempting to rebuild data, information, and knowledge already organized elsewhere, though we may develop a recasting-in-context here within the GeoKB in order to make better sense of things we link to in other contexts. This section discusses the specific system to system linkages we are developing as we continue pursuing use cases and analytical needs.
Mindat
An initial focus of the GeoKB is on use cases related to mineral resource assessment work in the USGS. While USGS is an authoritative source for certain types of geoscientific information related to our mission, there are better global authorities we can connect with to leverage the pieces of information we need to link with. Mindat is our source for information on the following:
- minerals
- rock names/classification
- commodities (mostly mineral related)
For the most part, we are only leveraging names (labels) and short descriptions along with the associated identifiers for these entities. We also incorporate relationships such as the classification of rocks. If we need additional details that Mindat provides, we can always go back to the source for more properties. The following query pulls those GeoKB entities that have same as relationships with Mindat items.
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?same_as
WHERE {
?item wdt:P84 ?same_as .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
FILTER CONTAINS(STR(?same_as), "mindat.org")
}
xDD (aka GeoDeepDive)
The xDD framework from collaborators at the University of Wisconsin-Madison provides a platform where we are processing collections of documents that you will find represented in the GeoKB. This includes a long-standing partnership that continually feeds USGS Series Publications into processing pipelines as well as newer collections coming from our GeoArchive work (e.g., NI 43-101 Technical Reports). xDD processing develops additional claims from both relatively mundane search indexing as well as more advanced entity recognition and AI assisted processing. The GeoKB becomes the place where entity recognition becomes entity linking as we confirm recognized entities as viable and record them as claims/links on the entities representing the associated documents. The gddid property houses the persistent, resolvable identifier used in doc_id and other queries with the xDD APIs. We are working on documenting sources associated with xDD and will get those refined to be associated as references for claims.
Wikidata (aka Global Knowledge Commons)
As stated elsewhere, one of the important things we are exploring with this framework is the connection we can build in to the largest fully open, public and collaborative knowledge graph in Wikidata. While there is much more work to do in this area, including how we better federate aspects of Wikidata we trust to use directly. In the near term, we are developing same as relationships between entities (and some properties) we represent in the GeoKB and similar items in Wikidata. We record these currently with a full URL, but likely need to work out a namespace approach. Same as relationships are used when we are reasonably confident in the semantics of the Wikidata instance and see also when we are unsure or haven't dug into things deeply enough. The following queries will surface those basic relationships.
OpenAlex
The OpenAlex platform contains additional valuable content on USGS publication products with DOI identifiers and on USGS staff and co-authors with ORCID identifiers. In addition to Works ("W" identifiers) and Authors ("A" identifiers), we pull in select Topics ("T" identifiers) and Concepts ("C" identifiers) from OpenAlex that are connected to other entities via "knows about" and "addresses subject" claims. All GeoKB entities that have a representation in OpenAlex have an OpenAlex ID claim pointing to the associated entity, with the formatter URL providing an actionable link.
Some OpenAlex topics may appear a bit far afield from traditional descriptors of the USGS scientific portfolio, though people unfamiliar with the full breadth of the USGS mission (even inside the organization) can be surprised at all of the areas we cover in our research. OpenAlex topics (and the "concepts" that came from the original work in the Microsoft Academic Knowledgegraph) are built by clustering papers using an AI process and deriving logical groupings of scientific topics. We leveraged the topics assigned to Works and Authors, selectively incorporating those that could be mapped semantically into the overall GeoKB ontology (subclasses of entity) that originated in more long-established sources from the USGS Thesaurus and Geoscience Ontology, in turn mapped to the Basic Formal Ontology and Common Core Ontologies.
GeoKB Logistics/Specifications
The following are notes on managing specific aspects of the Wikibase instance.
Property Ordering
The ordering of properties on item pages is managed through the MediaWiki:Wikibase-SortedProperties page.
SPARQL Examples
Various example SPARQL queries are maintained on Project:SPARQL/examples to populate the examples drop down in the query service.
Mediawiki Stats
This API call will show some useful statistics on how the Mediawiki instance as a whole is behaving. The number of jobs queued is a useful stat to check on the process of a bot in motion.
https://geokb.wikibase.cloud/w/api.php?action=query&meta=siteinfo&siprop=statistics
Allowable URLs
The blacklist on URLs catches certain URL patterns on Mediawiki pages like the item talk pages, often used in the GeoKB to house source material. This can be overridden with the MediaWiki:Spam-whitelist.
Personal Note
The Geoscience Knowledgebase was a passion project initiated and mostly carried out by me, Sky Bristol. I'm leaving 30+ years of government service as of March 31, 2025. Whether this effort carries on in some guise is unclear. I tried to use this experimental effort to demonstrate two things:
- What could be asked and answered if USGS information systems were made more linkable and interoperable with one another rather than working as siloes as they do now.
- How USGS could shift its focus from forcing consumers of our knowledge into our own esoteric and sometimes ill-functioning infrastructure toward projecting our knowledge out into the Global Knowledge Commons where it can be combined with many other sources of knowledge to inform decision analysis at many levels.
My hope is that three other loosely connected knowledge graph efforts may continue and ultimately lead more toward something like what I've started here. These include geoconnex.us (an effort focused on the water resources mission in the USGS along with partners), a collection of R&D projects on expressing The National Map as a knowledge graph, and a nascent effort arising from a USGS/DARPA/ARPA-E project called CriticalMAAS that takes some of what I started here in the mineral resources domain to an exciting new level that I hope will be released publicly in some form at some point. Each of these has similarities in that they are all building data, information, and knowledge into their most fundamental expression (RDF) and are attempting to align with standards and conventions that should make things more interoperable. I would challenge those efforts, however, to constantly look to how they can interconnect with one another and be linked to the broader commons through adherence to standards and deliberate interlinking (owl:sameAs).
This wikibase will remain online and viable as long as the wonderful folks at Wikimedia Deutschland keep the platform running and this instance online. It will cease active contributions on March 31 unless someone from USGS takes it up and continues the project. In the process of building out the content here, I added as many same as links to entities in Wikidata and other knowledge systems as possible. Anyone else is welcome to exploit those to pull content that may be unique to this instance, sourced from public USGS data and information systems. Software codes used to build this content are available on Github, with some later work ported to another venue. Anything that's public is donated into the public domain and available for anyone to use and build upon. I apologize for my poor documentation, but there are a few "hidden" easter eggs in there that may lend insight into how to better deal with USGS online content.