Main Page

From geokb
Revision as of 13:46, 30 June 2024 by Sky (talk | contribs) (→‎Properties)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Geoscience Knowledgebase is an experimental effort to encode all information and knowledge from the USGS earth systems science portfolio. It is a compelling idea to have domain-specific knowledge graphs sitting online and accessible to anyone that are "adjacent" to Wikidata but separate in that they are maintained by a group with vested interest in building them and using them in practice. As a government science institution, we have our own requirements on how we need to manage our information, and we likely don't want to have us government folks messing about directly in the "Global Knowledge Commons." Rather, our information is donated into the public domain for anyone else to do what they like. Read more about our motivation and background on the project in the About page.

Below, you will find a description of the main entities we are building here. Each section should have an example query that you can use to explore. This is all work in progress, so you'll see new things coming in as we get them worked out. You'll note some links to discussion pages where we are working through the specifics on how to model and encode our information into the graph.

Entities

Entities ("Q" identifiers or items) in the GeoKB are the primary object in the knowledgebase. They are the focus for our knowledge organization scheme. The majority of properties ("P" identifiers) have an item type classification meaning that the object they connect to a subject entity is another entity in the knowledgebase. The majority of our entities are not "native" to this knowledgebase; rather they are sourced from some other data or information system with the entity in the knowledgebase a basic representation of the "foreign-sourced" item.

Entities are built iteratively over time with software codes used to introduce them to the knowledgebase. We may start an entity representation with only a couple of pieces of information, with the bare minimum being a label, description, and either an "instance of" or "subclass of" claim, depending on the purpose of the entity in the knowledgebase. We also describe source material as entities classified as "knowledgebase source." Claims (or statements) made about an entity should always have at least one reference that essentially cites the source for the statement. These will often point to a "knowledgebase source" item or may use "reference URL" when the reference source is simply a single linkable online resource. Over time, we may revisit the initial starter code that introduced an entity to the knowledgebase to change how it operates and bring in further claims. We may also change the nature of how an entity specifies its source by changing of adding to references.

This page serves as iterative documentation of the entities in the knowledgebase with a focus toward how they can be retrieved and interacted with through SPARQL queries. It is organized into sections describing a few things about the major entity types. Each section may contain a link to an associated "item talk" page in this Wikibase instance that discusses the evolution of the entity. Each section will also contain a link to the associated ShExC schema for the entity type as those are developed iteratively. Schemas facilitate interactions with the knowledgebase such as ongoing multimodal introduction of new entity instances and claims.

Minerals Related Entities

USNS Mineral Resource Assessment work provides the seminal use cases for the GeoKB. We are using the knowledge graph model to transform and modernize a number of older databases, online tools, and information systems into a interconnected framework that can better scale into the future. The following section lays out the entity types in the GeoKB related to mineral resources and provides some example queries.

Mineral Resource Assessments

We are experimenting with how to define the concept of mineral resource assessment in the knowledgebase and how to use it in practice. As a start to that, we are using the concept in the sense of the assessment as a tool or end product of a scientific practice. As such, we assigned it as an instance of classifier for some publications that are considered mineral resource assessment products. The following query pulls their basic details:

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel 
(YEAR(?pub_date) AS ?year) 
(CONCAT("https://doi.org/", ?doi) AS ?doi_link)
(CONCAT("https://pubs.er.usgs.gov/publication/", ?indexId) AS ?usgs_link)
WHERE {
  ?item wdt:P1 wd:Q152682 .
  OPTIONAL {
    ?item wdt:P7 ?pub_date .
  }
  OPTIONAL {
    ?item wdt:P74 ?doi .
  }
  OPTIONAL {
    ?item wdt:P114 ?indexId .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Mineral Commodity

The following query pulls items classified as mineral commodity and associated code values from the USGS Mineral Resources Data System and/or Mindat. Not all codes are in place for the items and Mindat codes will be changing to the long-form identifier resolvable at mindat.org.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>

SELECT ?item ?itemLabel ?mrds_code ?mindat_id
WHERE {
  ?item wdt:P1 wd:Q406 .
  ?item p:P1 ?instance_statement .
  OPTIONAL {
    ?instance_statement pq:P19 ?mrds_code .
  }
  OPTIONAL {
    ?instance_statement pq:P99 ?mindat_id .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Mines

This query searches for the first 100 items representing mines with their names, identifiers, and point coordinates (which are a mappable WKT point that can be pulled into a mapping application).

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?mine ?mineLabel ?coordinate_location
WHERE {
  ?mine wdt:P1 wd:Q3646 .
  ?mine wdt:P6 ?coordinate_location .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100

Try it!


Geospatial Search for Mines

The following query finds mines within a 10km radius of the specified point (This uses a built in functionality found in the GeoKB, specified in the Mediawiki docs).

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?location ?distance
WHERE {
  ?item wdt:P1 wd:Q3646 .
  SERVICE wikibase:around { 
      ?item wdt:P6 ?location . 
      bd:serviceParam wikibase:center "POINT(-87.107680869 33.10434839)"^^geo:wktLiteral . 
      bd:serviceParam wikibase:radius "10" . 
      bd:serviceParam wikibase:distance ?distance.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ASC(?distance)

Try it!


Mining Facility

As we continue to evolve the model for how we organize information and knowledge about mining in the GeoKB, we are honing in on the need to have specific mining facilities classified and organized as top-level entities that can then be associated with a specific "mining project" as a higher level concept. We established mining facility as a subclass of a more general facility and then a set of more specific subclasses from the USMIN topographic mine symbol digitization project.

There is a class of mining facility for mine, which is what we used to classify all of the mine feature classes we pulled from GNIS. We will either continue to use that as a specific type of mining facility and use "mining project," "mining prospect," or something else as the higher level container or else elevate and reclassify "mine" to mean something else.

The following query returns the mining facility classification.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P2* wd:Q44143 . # subclass of (transitive) mining facility
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Rock Classification

We've started the GeoKB understanding of rock classification via the Mindat system. We will likely augment this over time with other interpretations and classification systems, but since we are pulling minerals from Mindat and want to link to rock types included in those records, starting with Mindat made some sense. The following queries show a bit of how to work with the classification itself in addition to what we link to specific classes.

Igneous Rocks

The following query starts with igneous rock and pulls the full classification from that point (* on the end of the predicate). Because we pull identifiers in this, you can use something like the graph view in Wikibase to visualize and explore the items through their connections.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?rock ?rockLabel ?subclass_of
WHERE {
  ?rock wdt:P2* wd:Q41459 .
  ?rock wdt:P2 ?subclass_of .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Items that are an instance of multiple things at the same time

One of the things we are experimenting with that is geared toward making the knowledge content in the GeoKB more transferrable to non-expert domains like the global knowledge commons is somewhat counter to how Mindat has handled the situation. Many of the things that we would give the same basic label are actually different things in different circumstances/contexts. We're experimenting with focusing on the label and then asserting that the entity can be an instance of multiple things. This probably violates some semantic modeling rules, so it may or may not stand the test of time. But it's a thought experiment in motion. The following query focuses in on commodities, pulling those that have more than one instance of claim.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel (COUNT(?instance_of) AS ?num_instance_of)
WHERE {
  ?item wdt:P1 wd:Q406 ;
          wdt:P1 ?instance_of .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?item ?itemLabel
HAVING (COUNT(?instance_of) > 1)

Try it!


Mindat Identifiers

We are using Mindat as a key reference for a number of things (rocks, minerals, etc.). The Mindat identifier provides the linkage back to Mindat for gathering additional information on the subject items. Mindat IDs are incorporates as a qualifier on the instance of statements made for items sourced from the Mindat API. The following query is one example of how to return rock items with their Mindat IDs.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/> 

SELECT ?item ?itemLabel ?mindat_id
WHERE {
  ?item wdt:P1 wd:Q41261 . # "instance of" "rock"
  OPTIONAL {
    ?item p:P1 ?statement . # Get the instance of statement to operate on
    OPTIONAL { ?statement pq:P99 ?mindat_id . } # Pull the Mindat ID qualifier as a value
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 10

Try it!


Mineral-related Items Classified Multiple Ways

We've made a design decision in the GeoKB with items that we refer to in common usage to only declare or instantiate these entities with one, uniquely identified item that is then classified and characterized to indicate the different ways the concept can be used. An example of this are items that can be a mineral, a mineral commodity, and a chemical element. This is essentially dealing with the issue of the same word or phrase meaning different things in different contexts. The alternative would be to declare separate entities, each with their own specific classification and other characteristics, and then use relationships between the different items or disambiguation features in the knowledgebase to distinguish between them. We will have to determine exactly which approach makes the most sense in practical use over time.

The following query searches for items that are classified as chemical element, mineral, and mineral commodity in the same logical labeled entity.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?commodity ?commodityLabel ?instance_ofLabel
WHERE {
  ?commodity wdt:P1 wd:Q406;
             wdt:P1 wd:Q280;
             wdt:P1 wd:Q24 .
  ?commodity wdt:P1 ?instance_of .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Organizations

Entities representing organizations are important in the GeoKB in a couple of areas. We have information on entities such as publications and projects connected to USGS "sub-organizations" such as Science Centers and Labs. We also have information associated with external organizations such as mining companies used to retrieve and organize prospecting history for mineral resource assessments.

People

Items representing people associated with the USGS are another type of entity built out in the GeoKB. We use these as reference points and connections to the overall scientific record captured in this knowledge graph. Person records come from public sources such as our USGS Staff Profiles and are further discussed on the person classification talk page.

People by employer

This query pulls all people along with their email address (already publicly visible) and reference URL (pointer to USGS staff profile).

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?email ?profile_url ?orcid
WHERE {
  ?item wdt:P1 wd:Q3 .
  ?item wdt:P107 wd:Q44210 .
  OPTIONAL {
    ?item wdt:P109 ?email .
  }
  OPTIONAL {
    ?item wdt:P31 ?profile_url .
  }
  OPTIONAL {
    ?item wdt:P106 ?orcid .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10000

Try it!


Occupations and Roles

We organized a number of concepts for the major occupations/professions of USGS staff along with a set of specialized leadership roles that help to understand the organization in capacity assessment use cases.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?item_alt_label
WHERE {
  ?item wdt:P2* ?classes .
  VALUES ?classes {wd:Q159568 wd:Q159617}
  OPTIONAL {
    ?item skos:altLabel ?item_alt_label .
    FILTER (lang(?item_alt_label)='en')
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!


Places

The GeoKB includes references for many named places that are necessary links from many other items. The following are some examples for this part of the knowledgebase.

U.S. States and Territories

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?fips_alpha
WHERE {
  ?item wdt:P13 ?fips_alpha .  # Both states and territories in the U.S. have two-character FIPS codes
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


U.S. Counties or Equivalent Subdivision

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?fips_code ?coordinates
WHERE {
  ?item wdt:P1 wd:Q481 . # "instance of" "county or equivalent"
  ?item wdt:P34 wd:Q256 . # in the state of Colorado
  ?item wdt:P22 ?fips_code .
  ?item wdt:P6 ?coordinates .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Publications/Documents

Another fundamental entity type we are bringing together in the GeoKB are representations of documents of one kind or another. For right now, we put "document" right at the top of the classification as a foundational type of "entity." You can query for the classification structure for documents with the following transitive query from the document root.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P2* wd:Q5 . # subclass of, transitive "document"
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


We can build on the query for document classes to get document instances and some of the statements we might need to use.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?instance_ofLabel (YEAR(?publication_date) AS ?year) ?doi
WHERE {
  ?classes wdt:P2* wd:Q5 . # subclass of, transitive "document"
  ?item wdt:P1 ?classes ; # get entities that are instances of any of the classes
        wdt:P1 ?instance_of ; # get the item classification to display
        wdt:P7 ?publication_date ; # only get items that have a publication date
        wdt:P74 ?doi . # only get items that have a doi
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100

Try it!


USGS Numbered Series

The "USGS Numbered Reports Series" are an important reference source in the GeoKB for many other things we are working on. We are building a process to keep a representation of USGS reports in sync from the USGS Publications Warehouse source, starting with a smaller subset important to current use cases. The the different USGS numbered series are part of a classification of documents. The following query uses that classification to pull all USGS reports and specific claims.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?itemAltLabel ?pub_year ?doi 
?pw_index_id ?country ?countryLabel ?us_state ?us_stateLabel
?county ?countyLabel
WHERE {
  ?item wdt:P1/wdt:P2* wd:Q11 .
  OPTIONAL {
    ?item wdt:P7 ?pub_year .
  }
  OPTIONAL {
    ?item wdt:P74 ?doi .
  }
  OPTIONAL {
    ?item wdt:P114 ?pw_index_id .
  }
  OPTIONAL {
    ?item wdt:P33 ?country .
  }
  OPTIONAL {
    ?item wdt:P34 ?us_state .
  }
  OPTIONAL {
    ?item wdt:P35 ?county .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


USGS-authored Articles

In addition to the USGS Numbered Series, we also build a representation in the GeoKB for journal articles contributed to by USGS staff and linkable through ORCID Identifiers in citation metadata. We may eventually need to extend beyond this constraint to include more scientific articles for which we can't make a confirmed ORCID link, but this is the initial criteria for what else we incorporate from the USGS Publications Warehouse.

NI 43-101 Technical Reports

These are a special type of document that is part of an early use case for the GeoKB. We manage the metadata and stored document content for these reports in a Zotero collection as part of the GeoArchive. From that data management foundation, we create a representation of basic identification information in the GeoKB for knowledge graph purposes. The following query is designed to assist operators that may be wanting to use the GeoKB as a route to retrieve metadata for NI 43-101 reports for external processing. It provides the crucial information needed to identify individual reports and their PDF file attachments.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>

SELECT ?report ?reportLabel ?meta_url ?content_url ?attachment_key ?file_size
WHERE {
  ?report wdt:P1 wd:Q10 ; # instance of NI 43-101 Technical Report
          wdt:P141 ?meta_url ; # permanent URL to the online representation (responds to application/json content negotiation)
          wdt:P136 ?content_url ; # read URL to attachment content (only accessible if authenticated to Zotero web UI)
          wdt:P143 ?attachment_key ; # attachment key that can be used to download PDF file content
          p:P143 ?attachment_key_statement . # get attachment key statement so we can get the file size qualifier
  ?attachment_key_statement pq:P144 ?file_size . # attachment file size
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100

Try it!


Duplicate Pubs Warehouse IDs

The USGS Publications Warehouse is our primary source for items representing scientific publications (USGS reports and journal articles). We tapped this source via a web service to build out a baseline of all USGS Numbered Series reports and all journal articles who had contributors identified with an ORCID (meaning we could reasonably establish those entities in the GeoKB and build in linkages). In doing this work, we discovered a number of cases where different USGS Pubs Warehouse records document the same publication. In these cases, we recorded multiple indexId values in the GeoKB items. The following query pulls out those cases for examination.

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel 
(GROUP_CONCAT(DISTINCT ?indexId; SEPARATOR=", ") AS ?indexIds)
(GROUP_CONCAT(DISTINCT ?doi; SEPARATOR=", ") AS ?dois)
WHERE {
  ?item wdt:P114 ?indexId .
  OPTIONAL {
    ?item wdt:P74 ?doi .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?item ?itemLabel
HAVING (COUNT(?indexId) > 1)

Try it!


Research Methods

Linkable research methods are incorporated into the GeoKB to support cases where we need these as filters or other factors in analyses.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?method ?methodLabel ?method_alt_label ?subclass_of ?subclass_ofLabel
WHERE {
  ?method wdt:P2* wd:Q152412 ; # Gets all subclasses of research method
          wdt:P2 ?subclass_of . # gets the subclass identifier itself so we can graph this
  OPTIONAL {
    ?method skos:altLabel ?method_alt_label . # gets any aliases individually so we can run name matching
    FILTER (lang(?method_alt_label)='en')
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Federated queries and "same as" links

One of the things we are working on in the GeoKB is the establishment of as many relationships as possible to other sources, including formal ontologies (for classes and properties) as well as similar knowledge bases. The latter includes Wikidata with a goal of connecting the "Global Knowledge Commons" with further information and details about items that we have some ownership of from the USGS. We build these linkages using the same as property, pointing to a persistent resolvable identifier in URL form in the foreign system. We are following a loose definition of "same as" here that is somewhere shy of exact match. It essentially means that we consider an item in the GeoKB to be representing the same thing that the linked entity also represents, accepting that neither our representation nor the foreign representation is necessarily complete or completely accurate. However, taken together they help to make up a more complete and useful representation.

Not all same as linkages are actionable in the same way. If the foreign resource is also accessible via a SPARQL end point (e.g., Wikidata), then we can build a federated query that is able to pull additional details about entities from the foreign source. Here's an example that uses the GeoKB representation of US States containing same as relationships with Wikidata representations to pull coordinate locations and capital cities from Wikidata.

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX geokbe: <https://geokb.wikibase.cloud/entity/>
PREFIX geokbp: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?fips_alpha ?same_as
?coordinateLocation ?capital_city ?capital_cityLabel
WHERE {
  ?item geokbp:P13 ?fips_alpha ;
        geokbp:P84 ?same_as .
  FILTER (CONTAINS(STR(?same_as), "wikidata.org")) # Need to filter to same as linkages from specific source
  SERVICE <https://query.wikidata.org/sparql> {
	?same_as wdt:P625 ?coordinateLocation ;
             wdt:P36 ?capital_city .
    ?capital_city rdfs:label ?capital_cityLabel . # Need to get labels via rdfs:label query
    FILTER(LANG(?capital_cityLabel) = "en") # and filter to English language
  }      	 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } # Acts on local query results
}
LIMIT 100

Try it!


Properties

Properties in the Wikibase instance are what connects subjects (entities) and objects (other objects or different kinds of values). We sometimes have the need to query for properties themselves and can do that via SPARQL. You can also browse the list of all properties.

SELECT ?property ?propertyLabel ?propertyDescription ?property_type WHERE {
  ?property a wikibase:Property .
  ?property wikibase:propertyType ?property_type .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Wiki Pages

Wikibase is a set of extensions on the core Mediawiki platform along with some extra components like WQDS, which is based in Blazegraph, to provide a SPARQL query service. We are experimenting with some of the functionality that being a Mediawiki instance affords through the item and property discussion or "talk" pages. In Wikidata, property talk pages are used extensively as the place where the conversation happens about a property as it proposed, debated, created, and evolved. This is a good idea and something we will likely use as we start getting a bit more sophisticated on property definition.

We are currently using item talk pages in a couple of ways:

  1. For some publication (document) items, we have abstracts and tables of contents available in source metadata. This is important content, and we are working on some methods to identify other entities within the graph (or that need to be added to the graph) from within these texts. We load them programmatically to the item talk pages as a way to make them visible in context for any type of use and to serve as a point in the provenance trace when we are able to leverage them for entity extraction/linking. Here's an example.
  2. As we work with AI processing pipelines to operate on the content represented by items in the knowledgebase, we sometimes need to feed back what we might use for some human review before deciding what claims to build out. We are experimenting with a couple ways of leveraging the item talk pages for this purpose and the conventions we want to apply for how to capture the conversation amongst human actors in a way that AI actors can exploit.

Linkages Beyond USGS

The whole point of what we're doing here in the GeoKB is to develop organized knowledge about the complex earth system and its interconnections as it touches on the domains of science covered by the USGS and our partners. As such, the links we develop between this system and other systems are perhaps the most vital component. We are not attempting to rebuild data, information, and knowledge already organized elsewhere, though we may develop a recasting-in-context here within the GeoKB in order to make better sense of things we link to in other contexts. This section discusses the specific system to system linkages we are developing as we continue pursuing use cases and analytical needs.

Mindat

An initial focus of the GeoKB is on use cases related to mineral resource assessment work in the USGS. While USGS is an authoritative source for certain types of geoscientific information related to our mission, there are better global authorities we can connect with to leverage the pieces of information we need to link with. Mindat is our source for information on the following:

For the most part, we are only leveraging names (labels) and short descriptions along with the associated identifiers for these entities. We also incorporate relationships such as the classification of rocks. If we need additional details that Mindat provides, we can always go back to the source for more properties. The following query pulls those GeoKB entities that have same as relationships with Mindat items.

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?same_as
WHERE {
  ?item wdt:P84 ?same_as .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
  FILTER CONTAINS(STR(?same_as), "mindat.org")
}

Try it!


xDD (aka GeoDeepDive)

The xDD framework from collaborators at the University of Wisconsin-Madison provides a platform where we are processing collections of documents that you will find represented in the GeoKB. This includes a long-standing partnership that continually feeds USGS Series Publications into processing pipelines as well as newer collections coming from our GeoArchive work (e.g., NI 43-101 Technical Reports). xDD processing develops additional claims from both relatively mundane search indexing as well as more advanced entity recognition and AI assisted processing. The GeoKB becomes the place where entity recognition becomes entity linking as we confirm recognized entities as viable and record them as claims/links on the entities representing the associated documents. The gddid property houses the persistent, resolvable identifier used in doc_id and other queries with the xDD APIs. We are working on documenting sources associated with xDD and will get those refined to be associated as references for claims.

This query returns 100 items that have a GDDID. The formatter URL for the GDDID as an external identifier points to the xDD API end point for the "article" - a basic rendering of the citation metadata.

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?gddid
WHERE {
  ?item wdt:P93 ?gddid .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100

Try it!


Wikidata (aka Global Knowledge Commons)

As stated elsewhere, one of the important things we are exploring with this framework is the connection we can build in to the largest fully open, public and collaborative knowledge graph in Wikidata. While there is much more work to do in this area, including how we better federate aspects of Wikidata we trust to use directly. In the near term, we are developing same as relationships between entities (and some properties) we represent in the GeoKB and similar items in Wikidata. We record these currently with a full URL, but likely need to work out a namespace approach. Same as relationships are used when we are reasonably confident in the semantics of the Wikidata instance and see also when we are unsure or haven't dug into things deeply enough. The following queries will surface those basic relationships.

same as

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?same_as
WHERE {
  ?item wdt:P84 ?same_as .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
  FILTER CONTAINS(STR(?same_as), "wikidata.org")
}

Try it!


GeoKB Logistics/Specifications

Prefixes/Namespaces

SPARQL queries and work with RDF in general uses the concept of a short prefix string representing a URI to resolve an identifier, creating the ability to refer to different resolvable knowledge representations with a short string. Prefix declaration in a query lets us use the short strings in place of constantly putting in a fully qualified and resolvable path. The wikibase.cloud instances all use a standard set of built-in prefixes and resolvers, some of which are well known and in common use.

They also include the standard wikidata prefixes referring to Wikidata resolution paths. Initially, in this Wikibase instance, we were documenting queries in a way that might be confusing in that we use the same standard "wd" prefixes but declared them as this Wikibase. We've since changed that convention, but you may still find references that point to the older style than the conventions shown below. The built-in query service for this Wikibase instance will is not federated to send queries to other end points other than its own. Some typical SPARQL syntax such as asking for primary labels as "rdfs:label" (in any language without a filter) will function.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix gkd: <https://geokb.wikibase.cloud/entity/> .
@prefix gkdata: <https://geokb.wikibase.cloud/wiki/Special:EntityData/> .
@prefix gks: <https://geokb.wikibase.cloud/entity/statement/> .
@prefix gkref: <https://geokb.wikibase.cloud/reference/> .
@prefix gkv: <https://geokb.wikibase.cloud/value/> .
@prefix gkwdt: <https://geokb.wikibase.cloud/prop/direct/> .
@prefix gkwdtn: <https://geokb.wikibase.cloud/prop/direct-normalized/> .
@prefix gkp: <https://geokb.wikibase.cloud/prop/> .
@prefix gkps: <https://geokb.wikibase.cloud/prop/statement/> .
@prefix gkpsv: <https://geokb.wikibase.cloud/prop/statement/value/> .
@prefix gkpsn: <https://geokb.wikibase.cloud/prop/statement/value-normalized/> .
@prefix gkpq: <https://geokb.wikibase.cloud/prop/qualifier/> .
@prefix gkpqv: <https://geokb.wikibase.cloud/prop/qualifier/value/> .
@prefix gkpqn: <https://geokb.wikibase.cloud/prop/qualifier/value-normalized/> .
@prefix gkpr: <https://geokb.wikibase.cloud/prop/reference/> .
@prefix gkprv: <https://geokb.wikibase.cloud/prop/reference/value/> .
@prefix gkprn: <https://geokb.wikibase.cloud/prop/reference/value-normalized/> .
@prefix gkno: <https://geokb.wikibase.cloud/prop/novalue/> .


Property Ordering

The ordering of properties on item pages is managed through the MediaWiki:Wikibase-SortedProperties page.

Wiki Management

This API call will show some useful statistics on how the Mediawiki instance as a whole is behaving. The number of jobs queued is a useful stat to check on the process of a bot in motion.

https://geokb.wikibase.cloud/w/api.php?action=query&meta=siteinfo&siprop=statistics

After running into some issues with URLs being posted to Item_talk pages being caught by the spam filter, I discovered that there is a universal blacklist filter configured into each wikibase.cloud instance but that this can be overridden by adding regex fragments to MediaWiki:Spam-whitelist.