SPARQL examples: Difference between revisions
No edit summary |
|||
(27 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= | = Entities = | ||
Entities ("Q" identifiers or items) in the GeoKB are the primary object in the knowledgebase. They are the focus for our knowledge organization scheme. The majority of properties ("P" identifiers) have an item type classification meaning that the object they connect to a subject entity is another entity in the knowledgebase. The majority of our entities are not "native" to this knowledgebase; rather they are sourced from some other data or information system with the entity in the knowledgebase a basic representation of the "foreign-sourced" item. | |||
== Mines == | Entities are built iteratively over time with software codes used to introduce them to the knowledgebase. We may start an entity representation with only a couple of pieces of information, with the bare minimum being a label, description, and either an "[[Property:P1|instance of]]" or "[[Property:P2|subclass of]]" claim, depending on the purpose of the entity in the knowledgebase. We also describe source material as entities classified as "[[Item:Q26267|knowledgebase source]]." Claims (or statements) made about an entity should always have at least one reference that essentially cites the source for the statement. These will often point to a "knowledgebase source" item or may use "[[Property:P31|reference URL]]" when the reference source is simply a single linkable online resource. Over time, we may revisit the initial starter code that introduced an entity to the knowledgebase to change how it operates and bring in further claims. We may also change the nature of how an entity specifies its source by changing of adding to references. | ||
This page serves as iterative documentation of the entities in the knowledgebase with a focus toward how they can be retrieved and interacted with in use through SPARQL queries. It is organized into sections describing a few things about the major entity types. Each section may contain a link to an associated "item talk" page in this Wikibase instance that discusses the evolution of the entity. Each section will also contain a link to the associated ShExC schema for the entity type as those are developed iteratively. Schemas facilitate interactions with the knowledgebase such as ongoing multimodal introduction of new entity instances and claims. | |||
== Minerals Related Entities == | |||
USNS Mineral Resource Assessment work provides the seminal use cases for the GeoKB. We are using the knowledge graph model to transform and modernize a number of older databases, online tools, and information systems into a interconnected framework that can better scale into the future. The following section lays out the entity types in the GeoKB related to mineral resources and provides some example queries. | |||
=== Mineral Commodity === | |||
The following query pulls items classified as mineral commodity and associated code values from the USGS Mineral Resources Data System and/or Mindat. Not all codes are in place for the items and Mindat codes will be changing to the long-form identifier resolvable at mindat.org. | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
PREFIX p: <https://geokb.wikibase.cloud/prop/> | |||
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/> | |||
SELECT ?item ?itemLabel ?mrds_code ?mindat_id | |||
WHERE { | |||
?item wdt:P1 wd:Q406 . | |||
?item p:P1 ?instance_statement . | |||
OPTIONAL { | |||
?instance_statement pq:P19 ?mrds_code . | |||
} | |||
OPTIONAL { | |||
?instance_statement pq:P99 ?mindat_id . | |||
} | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
</sparql> | |||
=== Mines === | |||
This query searches for the first 100 items representing mines with their names, identifiers, and point coordinates (which are a mappable WKT point that can be pulled into a mapping application). | This query searches for the first 100 items representing mines with their names, identifiers, and point coordinates (which are a mappable WKT point that can be pulled into a mapping application). | ||
* [[Item_talk:Q3646|Mine discussion]] | |||
<sparql tryit="1"> | <sparql tryit="1"> | ||
Line 18: | Line 50: | ||
</sparql> | </sparql> | ||
== Rock Classification == | ==== Geospatial Search for Mines ==== | ||
The following query finds mines within a 10km radius of the specified point (This uses a built in functionality found in the GeoKB, specified in the [https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/el#Geospatial_search| Mediawiki docs]). | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel ?location ?distance | |||
WHERE { | |||
?item wdt:P1 wd:Q3646 . | |||
SERVICE wikibase:around { | |||
?item wdt:P6 ?location . | |||
bd:serviceParam wikibase:center "POINT(-87.107680869 33.10434839)"^^geo:wktLiteral . | |||
bd:serviceParam wikibase:radius "10" . | |||
bd:serviceParam wikibase:distance ?distance. | |||
} | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
ORDER BY ASC(?distance) | |||
</sparql> | |||
=== Mining Facility === | |||
As we continue to evolve the model for how we organize information and knowledge about mining in the GeoKB, we are honing in on the need to have specific mining facilities classified and organized as top-level entities that can then be associated with a specific "mining project" as a higher level concept. We established [[Item:Q44143|mining facility]] as a subclass of a more general facility and then a set of more specific subclasses from the USMIN topographic mine symbol digitization project. | |||
There is a class of mining facility for [[Item:Q36466|mine]], which is what we used to classify all of the mine feature classes we pulled from GNIS. We will either continue to use that as a specific type of mining facility and use "mining project," "mining prospect," or something else as the higher level container or else elevate and reclassify "mine" to mean something else. | |||
* [[EntitySchema:E3|Mining Facility Schema]] | |||
* [[Item_talk:Q44143|Mining Facility discussion page]] | |||
The following query returns the mining facility classification. | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel | |||
WHERE { | |||
?item wdt:P2* wd:Q44143 . # subclass of (transitive) mining facility | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
</sparql> | |||
=== Rock Classification === | |||
We've started the GeoKB understanding of rock classification via the Mindat system. We will likely augment this over time with other interpretations and classification systems, but since we are pulling minerals from Mindat and want to link to rock types included in those records, starting with Mindat made some sense. The following queries show a bit of how to work with the classification itself in addition to what we link to specific classes. | We've started the GeoKB understanding of rock classification via the Mindat system. We will likely augment this over time with other interpretations and classification systems, but since we are pulling minerals from Mindat and want to link to rock types included in those records, starting with Mindat made some sense. The following queries show a bit of how to work with the classification itself in addition to what we link to specific classes. | ||
=== Igneous Rocks === | ==== Igneous Rocks ==== | ||
The following query starts with igneous rock and pulls the full classification from that point (* on the end of the predicate). Because we pull identifiers in this, you can use something like the graph view in Wikibase to visualize and explore the items through their connections. | The following query starts with igneous rock and pulls the full classification from that point (* on the end of the predicate). Because we pull identifiers in this, you can use something like the graph view in Wikibase to visualize and explore the items through their connections. | ||
Line 36: | Line 110: | ||
</sparql> | </sparql> | ||
== Mineral-related Items Classified Multiple Ways == | ==== Items that are an instance of multiple things at the same time ==== | ||
One of the things we are experimenting with that is geared toward making the knowledge content in the GeoKB more transferrable to non-expert domains like the global knowledge commons is somewhat counter to how Mindat has handled the situation. Many of the things that we would give the same basic label are actually different things in different circumstances/contexts. We're experimenting with focusing on the label and then asserting that the entity can be an instance of multiple things. This probably violates some semantic modeling rules, so it may or may not stand the test of time. But it's a thought experiment in motion. The following query focuses in on commodities, pulling those that have more than one instance of claim. | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel (COUNT(?instance_of) AS ?num_instance_of) | |||
WHERE { | |||
?item wdt:P1 wd:Q406 ; | |||
wdt:P1 ?instance_of . | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
GROUP BY ?item ?itemLabel | |||
HAVING (COUNT(?instance_of) > 1) | |||
</sparql> | |||
==== Mindat Identifiers ==== | |||
We are using Mindat as a key reference for a number of things (rocks, minerals, etc.). The Mindat identifier provides the linkage back to Mindat for gathering additional information on the subject items. Mindat IDs are incorporates as a qualifier on the instance of statements made for items sourced from the Mindat API. The following query is one example of how to return rock items with their Mindat IDs. | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
PREFIX p: <https://geokb.wikibase.cloud/prop/> | |||
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/> | |||
SELECT ?item ?itemLabel ?mindat_id | |||
WHERE { | |||
?item wdt:P1 wd:Q41261 . # "instance of" "rock" | |||
OPTIONAL { | |||
?item p:P1 ?statement . # Get the instance of statement to operate on | |||
OPTIONAL { ?statement pq:P99 ?mindat_id . } # Pull the Mindat ID qualifier as a value | |||
} | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
LIMIT 10 | |||
</sparql> | |||
=== Mineral-related Items Classified Multiple Ways === | |||
We've made a design decision in the GeoKB with items that we refer to in common usage to only declare or instantiate these entities with one, uniquely identified item that is then classified and characterized to indicate the different ways the concept can be used. An example of this are items that can be a mineral, a mineral commodity, and a chemical element. This is essentially dealing with the issue of the same word or phrase meaning different things in different contexts. The alternative would be to declare separate entities, each with their own specific classification and other characteristics, and then use relationships between the different items or disambiguation features in the knowledgebase to distinguish between them. We will have to determine exactly which approach makes the most sense in practical use over time. | We've made a design decision in the GeoKB with items that we refer to in common usage to only declare or instantiate these entities with one, uniquely identified item that is then classified and characterized to indicate the different ways the concept can be used. An example of this are items that can be a mineral, a mineral commodity, and a chemical element. This is essentially dealing with the issue of the same word or phrase meaning different things in different contexts. The alternative would be to declare separate entities, each with their own specific classification and other characteristics, and then use relationships between the different items or disambiguation features in the knowledgebase to distinguish between them. We will have to determine exactly which approach makes the most sense in practical use over time. | ||
Line 45: | Line 158: | ||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | ||
SELECT ?commodity ?commodityLabel | SELECT ?commodity ?commodityLabel ?instance_ofLabel | ||
WHERE { | WHERE { | ||
?commodity wdt:P1 wd:Q406; | |||
wdt:P1 wd:Q280; | |||
wdt:P1 wd:Q24 . | |||
?commodity wdt:P1 ?instance_of . | ?commodity wdt:P1 ?instance_of . | ||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | ||
} | } | ||
</sparql> | </sparql> | ||
= Organizations = | == Organizations == | ||
Entities representing organizations are important in the GeoKB in a couple of areas. We have information on entities such as publications and projects connected to USGS "sub-organizations" such as Science Centers and Labs. We also have information associated with external organizations such as mining companies used to retrieve and organize prospecting history for mineral resource assessments. | Entities representing organizations are important in the GeoKB in a couple of areas. We have information on entities such as publications and projects connected to USGS "sub-organizations" such as Science Centers and Labs. We also have information associated with external organizations such as mining companies used to retrieve and organize prospecting history for mineral resource assessments. | ||
The following query retrieves information on organization entities that are characterized as "part of" the [[Item:Q44210|U.S. Geological Survey]]. (This is still a work in progress as we have an incomplete record of USGS organizations of interest and are working to harmonize across several sources.) | * [[EntitySchema:E2|Organization EntitySchema]] | ||
The following query retrieves information on organization entities that are characterized as "part of" the [[Item:Q44210|U.S. Geological Survey]]. Because we often need to lookup an organization based on imperfect identifiers like name and url variations, this query may return multiple records for each entity with different URLs and aliases. This allows us to efficiently figure out if we already have a label or URL variant included for linking purposes. (This is still a work in progress as we have an incomplete record of USGS organizations of interest and are working to harmonize across several sources.) | |||
<sparql tryit=1> | <sparql tryit=1> | ||
Line 62: | Line 179: | ||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | ||
SELECT ?item ?itemLabel ? | SELECT ?item ?itemLabel ?item_alt_label ?instance_ofLabel ?url | ||
WHERE { | WHERE { | ||
? | ?org_types wdt:P2 wd:Q50862 . # Gets subclasses of USGS organization | ||
?item wdt: | ?item wdt:P1 ?org_types . # Gets items in those classes | ||
?item wdt:P1 ?instance_of . # Gets the individual instance of classification | |||
OPTIONAL { | |||
?item skos:altLabel ?item_alt_label . | |||
FILTER (lang(?item_alt_label)='en') | |||
} | |||
OPTIONAL { | |||
?item wdt:P31 ?url . # Get all reference URLs when available | |||
} | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | ||
} | } | ||
</sparql> | </sparql> | ||
= People = | == People == | ||
Items representing people associated with the USGS are another type of entity built out in the GeoKB. We use these as reference points and connections to the overall scientific record captured in this knowledge graph. Person records come from public sources such as our USGS Staff Profiles and are further discussed on the [[Item_talk:Q3|person classification talk page]]. | Items representing people associated with the USGS are another type of entity built out in the GeoKB. We use these as reference points and connections to the overall scientific record captured in this knowledge graph. Person records come from public sources such as our USGS Staff Profiles and are further discussed on the [[Item_talk:Q3|person classification talk page]]. | ||
== People by employer == | * [[EntitySchema:E1|Person EntitySchema]] | ||
=== People by employer === | |||
This query pulls all people along with their email address (already publicly visible) and reference URL (pointer to USGS staff profile). | This query pulls all people along with their email address (already publicly visible) and reference URL (pointer to USGS staff profile). | ||
Line 98: | Line 225: | ||
</sparql> | </sparql> | ||
= Prefixes/Namespaces = | == Places == | ||
The GeoKB includes references for many named places that are necessary links from many other items. The following are some examples for this part of the knowledgebase. | |||
=== U.S. States and Territories === | |||
<sparql tryit="1"> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel ?fips_alpha | |||
WHERE { | |||
?item wdt:P13 ?fips_alpha . # Both states and territories in the U.S. have two-character FIPS codes | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
</sparql> | |||
=== U.S. Counties or Equivalent Subdivision === | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel ?fips_code ?coordinates | |||
WHERE { | |||
?item wdt:P1 wd:Q481 . # "instance of" "county or equivalent" | |||
?item wdt:P34 wd:Q256 . # in the state of Colorado | |||
?item wdt:P22 ?fips_code . | |||
?item wdt:P6 ?coordinates . | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
</sparql> | |||
== Documents == | |||
Another fundamental entity type we are bringing together in the GeoKB are representations of documents of one kind or another. For right now, we put "document" right at the top of the classification as a foundational type of "entity." You can query for the classification structure for documents with the following transitive query from the document root. | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel | |||
WHERE { | |||
?item wdt:P2* wd:Q5 . # subclass of, transitive "document" | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
</sparql> | |||
You'll see here that we've laid out things like the classification of USGS Report Series as we work to bring in a representation for all USGS reports as one suite of document assets we often need to reference and link to from other entities. | |||
=== USGS Series Reports === | |||
The "USGS Numbered Reports Series" are an important reference source in the GeoKB for many other things we are working on. We are building a process to keep a representation of USGS reports in sync from the USGS Publications Warehouse source, starting with a smaller subset important to current use cases. The the different USGS numbered series are part of a classification of documents. The following query uses that classification to pull all USGS reports and specific claims. | |||
* [[Item_talk:Q11|USGS Series Reports discussion]] | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel ?itemAltLabel ?pub_year ?doi | |||
?pw_index_id ?country ?countryLabel ?us_state ?us_stateLabel | |||
?county ?countyLabel | |||
WHERE { | |||
?item wdt:P1/wdt:P2* wd:Q11 . | |||
OPTIONAL { | |||
?item wdt:P7 ?pub_year . | |||
} | |||
OPTIONAL { | |||
?item wdt:P74 ?doi . | |||
} | |||
OPTIONAL { | |||
?item wdt:P114 ?pw_index_id . | |||
} | |||
OPTIONAL { | |||
?item wdt:P33 ?country . | |||
} | |||
OPTIONAL { | |||
?item wdt:P34 ?us_state . | |||
} | |||
OPTIONAL { | |||
?item wdt:P35 ?county . | |||
} | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
</sparql> | |||
=== NI 43-101 Technical Reports === | |||
These are a special type of document that is part of an early use case for the GeoKB. We manage the metadata and stored document content for these reports in a Zotero collection as part of the GeoArchive, and create a representation of basic identification information in the GeoKB for knowledge graph purposes. The following query pulls 100 NI 43-101 Technical Report entities and a couple of pieces from their claims as an example. | |||
<sparql tryit="1"> | |||
PREFIX wd: <https://geokb.wikibase.cloud/entity/> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel ?url ?addresses_place ?addresses_placeLabel | |||
WHERE { | |||
?item wdt:P1 wd:Q10 . # instance of NI 43-101 Technical Report | |||
?item wdt:P31 ?url . | |||
OPTIONAL { | |||
?item wdt:P95 ?addresses_place . | |||
} | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
LIMIT 100 | |||
</sparql> | |||
= Properties = | |||
Properties in the Wikibase instance are what connects subjects (entities) and objects (other objects or different kinds of values). We sometimes have the need to query for properties themselves and can do that via SPARQL. You can also browse the list of [[Special:ListProperties|all properties]]. | |||
<sparql tryit="1"> | |||
SELECT ?property ?propertyLabel ?property_type WHERE { | |||
?property a wikibase:Property . | |||
?property wikibase:propertyType ?property_type . | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
</sparql> | |||
= Linkages = | |||
The whole point of what we're doing here in the GeoKB is to develop organized knowledge about the complex earth system and its interconnections as it touches on the domains of science covered by the USGS and our partners. As such, the links we develop between this system and other systems are perhaps the most vital component. We are not attempting to rebuild data, information, and knowledge already organized elsewhere, though we may develop a recasting-in-context here within the GeoKB in order to make better sense of things we link to in other contexts. This section discusses the specific system to system linkages we are developing as we continue pursuing use cases and analytical needs. | |||
== Mindat == | |||
An initial focus of the GeoKB is on use cases related to mineral resource assessment work in the USGS. While USGS is an authoritative source for certain types of geoscientific information related to our mission, there are better global authorities we can connect with to leverage the pieces of information we need to link with. [https://mindat.org Mindat] is our source for information on the following: | |||
* [https://www.mindat.org/minerals.php minerals] | |||
* [https://www.mindat.org/min-50468.html rock names/classification] | |||
* commodities (mostly mineral related) | |||
For the most part, we are only leveraging names (labels) and short descriptions along with the associated identifiers for these entities. We also incorporate relationships such as the classification of rocks. If we need additional details that Mindat provides, we can always go back to the source for more properties. | |||
== xDD (aka GeoDeepDive) == | |||
The [https://geodeepdive.org xDD framework] from collaborators at the University of Wisconsin-Madison provides a platform where we are processing collections of documents that you will find represented in the GeoKB. This includes a long-standing partnership that continually feeds USGS Series Publications into processing pipelines as well as newer collections coming from our GeoArchive work (e.g., NI 43-101 Technical Reports). xDD processing develops additional claims from both relatively mundane search indexing as well as more advanced entity recognition and AI assisted processing. The GeoKB becomes the place where entity recognition becomes entity linking as we confirm recognized entities as viable and record them as claims/links on the entities representing the associated documents. The [[Property:P93|gddid property]] houses the persistent, resolvable identifier used in doc_id and other queries with the xDD APIs. We are working on documenting sources associated with xDD and will get those refined to be associated as references for claims. | |||
This query returns 100 items that have a GDDID. The formatter URL for the GDDID as an external identifier points to the xDD API end point for the "article" - a basic rendering of the citation metadata. | |||
<sparql tryit="1"> | |||
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/> | |||
SELECT ?item ?itemLabel ?gddid | |||
WHERE { | |||
?item wdt:P93 ?gddid . | |||
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } | |||
} | |||
LIMIT 100 | |||
</sparql> | |||
= GeoKB Logistics/Specifications = | |||
== Prefixes/Namespaces == | |||
SPARQL queries and work with RDF in general requires uses the concept of a short prefix string representing a URI to resolve an identifier. Prefix declaration in a query lets us use the short strings in place of constantly putting in a fully qualified and resolvable path. The wikibase.cloud folks have taken the stance that prefixes should always be explicitly declared as opposed to being in the default configuration for queries, which is good practice but a bit of a pain. Here is a rundown of the full set of prefixes for this Wikibase instance. You'll see the applicable prefixes from this exhaustive list in the query examples. | SPARQL queries and work with RDF in general requires uses the concept of a short prefix string representing a URI to resolve an identifier. Prefix declaration in a query lets us use the short strings in place of constantly putting in a fully qualified and resolvable path. The wikibase.cloud folks have taken the stance that prefixes should always be explicitly declared as opposed to being in the default configuration for queries, which is good practice but a bit of a pain. Here is a rundown of the full set of prefixes for this Wikibase instance. You'll see the applicable prefixes from this exhaustive list in the query examples. | ||
Latest revision as of 17:44, 28 July 2023
Entities
Entities ("Q" identifiers or items) in the GeoKB are the primary object in the knowledgebase. They are the focus for our knowledge organization scheme. The majority of properties ("P" identifiers) have an item type classification meaning that the object they connect to a subject entity is another entity in the knowledgebase. The majority of our entities are not "native" to this knowledgebase; rather they are sourced from some other data or information system with the entity in the knowledgebase a basic representation of the "foreign-sourced" item.
Entities are built iteratively over time with software codes used to introduce them to the knowledgebase. We may start an entity representation with only a couple of pieces of information, with the bare minimum being a label, description, and either an "instance of" or "subclass of" claim, depending on the purpose of the entity in the knowledgebase. We also describe source material as entities classified as "knowledgebase source." Claims (or statements) made about an entity should always have at least one reference that essentially cites the source for the statement. These will often point to a "knowledgebase source" item or may use "reference URL" when the reference source is simply a single linkable online resource. Over time, we may revisit the initial starter code that introduced an entity to the knowledgebase to change how it operates and bring in further claims. We may also change the nature of how an entity specifies its source by changing of adding to references.
This page serves as iterative documentation of the entities in the knowledgebase with a focus toward how they can be retrieved and interacted with in use through SPARQL queries. It is organized into sections describing a few things about the major entity types. Each section may contain a link to an associated "item talk" page in this Wikibase instance that discusses the evolution of the entity. Each section will also contain a link to the associated ShExC schema for the entity type as those are developed iteratively. Schemas facilitate interactions with the knowledgebase such as ongoing multimodal introduction of new entity instances and claims.
Minerals Related Entities
USNS Mineral Resource Assessment work provides the seminal use cases for the GeoKB. We are using the knowledge graph model to transform and modernize a number of older databases, online tools, and information systems into a interconnected framework that can better scale into the future. The following section lays out the entity types in the GeoKB related to mineral resources and provides some example queries.
Mineral Commodity
The following query pulls items classified as mineral commodity and associated code values from the USGS Mineral Resources Data System and/or Mindat. Not all codes are in place for the items and Mindat codes will be changing to the long-form identifier resolvable at mindat.org.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>
SELECT ?item ?itemLabel ?mrds_code ?mindat_id
WHERE {
?item wdt:P1 wd:Q406 .
?item p:P1 ?instance_statement .
OPTIONAL {
?instance_statement pq:P19 ?mrds_code .
}
OPTIONAL {
?instance_statement pq:P99 ?mindat_id .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Mines
This query searches for the first 100 items representing mines with their names, identifiers, and point coordinates (which are a mappable WKT point that can be pulled into a mapping application).
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?mine ?mineLabel ?coordinate_location
WHERE {
?mine wdt:P1 wd:Q3646 .
?mine wdt:P6 ?coordinate_location .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100
Geospatial Search for Mines
The following query finds mines within a 10km radius of the specified point (This uses a built in functionality found in the GeoKB, specified in the Mediawiki docs).
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?location ?distance
WHERE {
?item wdt:P1 wd:Q3646 .
SERVICE wikibase:around {
?item wdt:P6 ?location .
bd:serviceParam wikibase:center "POINT(-87.107680869 33.10434839)"^^geo:wktLiteral .
bd:serviceParam wikibase:radius "10" .
bd:serviceParam wikibase:distance ?distance.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ASC(?distance)
Mining Facility
As we continue to evolve the model for how we organize information and knowledge about mining in the GeoKB, we are honing in on the need to have specific mining facilities classified and organized as top-level entities that can then be associated with a specific "mining project" as a higher level concept. We established mining facility as a subclass of a more general facility and then a set of more specific subclasses from the USMIN topographic mine symbol digitization project.
There is a class of mining facility for mine, which is what we used to classify all of the mine feature classes we pulled from GNIS. We will either continue to use that as a specific type of mining facility and use "mining project," "mining prospect," or something else as the higher level container or else elevate and reclassify "mine" to mean something else.
The following query returns the mining facility classification.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel
WHERE {
?item wdt:P2* wd:Q44143 . # subclass of (transitive) mining facility
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Rock Classification
We've started the GeoKB understanding of rock classification via the Mindat system. We will likely augment this over time with other interpretations and classification systems, but since we are pulling minerals from Mindat and want to link to rock types included in those records, starting with Mindat made some sense. The following queries show a bit of how to work with the classification itself in addition to what we link to specific classes.
Igneous Rocks
The following query starts with igneous rock and pulls the full classification from that point (* on the end of the predicate). Because we pull identifiers in this, you can use something like the graph view in Wikibase to visualize and explore the items through their connections.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?rock ?rockLabel ?subclass_of
WHERE {
?rock wdt:P2* wd:Q41459 .
?rock wdt:P2 ?subclass_of .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Items that are an instance of multiple things at the same time
One of the things we are experimenting with that is geared toward making the knowledge content in the GeoKB more transferrable to non-expert domains like the global knowledge commons is somewhat counter to how Mindat has handled the situation. Many of the things that we would give the same basic label are actually different things in different circumstances/contexts. We're experimenting with focusing on the label and then asserting that the entity can be an instance of multiple things. This probably violates some semantic modeling rules, so it may or may not stand the test of time. But it's a thought experiment in motion. The following query focuses in on commodities, pulling those that have more than one instance of claim.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel (COUNT(?instance_of) AS ?num_instance_of)
WHERE {
?item wdt:P1 wd:Q406 ;
wdt:P1 ?instance_of .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?item ?itemLabel
HAVING (COUNT(?instance_of) > 1)
Mindat Identifiers
We are using Mindat as a key reference for a number of things (rocks, minerals, etc.). The Mindat identifier provides the linkage back to Mindat for gathering additional information on the subject items. Mindat IDs are incorporates as a qualifier on the instance of statements made for items sourced from the Mindat API. The following query is one example of how to return rock items with their Mindat IDs.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>
SELECT ?item ?itemLabel ?mindat_id
WHERE {
?item wdt:P1 wd:Q41261 . # "instance of" "rock"
OPTIONAL {
?item p:P1 ?statement . # Get the instance of statement to operate on
OPTIONAL { ?statement pq:P99 ?mindat_id . } # Pull the Mindat ID qualifier as a value
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 10
We've made a design decision in the GeoKB with items that we refer to in common usage to only declare or instantiate these entities with one, uniquely identified item that is then classified and characterized to indicate the different ways the concept can be used. An example of this are items that can be a mineral, a mineral commodity, and a chemical element. This is essentially dealing with the issue of the same word or phrase meaning different things in different contexts. The alternative would be to declare separate entities, each with their own specific classification and other characteristics, and then use relationships between the different items or disambiguation features in the knowledgebase to distinguish between them. We will have to determine exactly which approach makes the most sense in practical use over time.
The following query searches for items that are classified as chemical element, mineral, and mineral commodity in the same logical labeled entity.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?commodity ?commodityLabel ?instance_ofLabel
WHERE {
?commodity wdt:P1 wd:Q406;
wdt:P1 wd:Q280;
wdt:P1 wd:Q24 .
?commodity wdt:P1 ?instance_of .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Organizations
Entities representing organizations are important in the GeoKB in a couple of areas. We have information on entities such as publications and projects connected to USGS "sub-organizations" such as Science Centers and Labs. We also have information associated with external organizations such as mining companies used to retrieve and organize prospecting history for mineral resource assessments.
The following query retrieves information on organization entities that are characterized as "part of" the U.S. Geological Survey. Because we often need to lookup an organization based on imperfect identifiers like name and url variations, this query may return multiple records for each entity with different URLs and aliases. This allows us to efficiently figure out if we already have a label or URL variant included for linking purposes. (This is still a work in progress as we have an incomplete record of USGS organizations of interest and are working to harmonize across several sources.)
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?item_alt_label ?instance_ofLabel ?url
WHERE {
?org_types wdt:P2 wd:Q50862 . # Gets subclasses of USGS organization
?item wdt:P1 ?org_types . # Gets items in those classes
?item wdt:P1 ?instance_of . # Gets the individual instance of classification
OPTIONAL {
?item skos:altLabel ?item_alt_label .
FILTER (lang(?item_alt_label)='en')
}
OPTIONAL {
?item wdt:P31 ?url . # Get all reference URLs when available
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
People
Items representing people associated with the USGS are another type of entity built out in the GeoKB. We use these as reference points and connections to the overall scientific record captured in this knowledge graph. Person records come from public sources such as our USGS Staff Profiles and are further discussed on the person classification talk page.
People by employer
This query pulls all people along with their email address (already publicly visible) and reference URL (pointer to USGS staff profile).
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?email ?profile_url ?orcid
WHERE {
?item wdt:P1 wd:Q3 .
?item wdt:P107 wd:Q44210 .
OPTIONAL {
?item wdt:P109 ?email .
}
OPTIONAL {
?item wdt:P31 ?profile_url .
}
OPTIONAL {
?item wdt:P106 ?orcid .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10000
Places
The GeoKB includes references for many named places that are necessary links from many other items. The following are some examples for this part of the knowledgebase.
U.S. States and Territories
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?fips_alpha
WHERE {
?item wdt:P13 ?fips_alpha . # Both states and territories in the U.S. have two-character FIPS codes
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
U.S. Counties or Equivalent Subdivision
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?fips_code ?coordinates
WHERE {
?item wdt:P1 wd:Q481 . # "instance of" "county or equivalent"
?item wdt:P34 wd:Q256 . # in the state of Colorado
?item wdt:P22 ?fips_code .
?item wdt:P6 ?coordinates .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Documents
Another fundamental entity type we are bringing together in the GeoKB are representations of documents of one kind or another. For right now, we put "document" right at the top of the classification as a foundational type of "entity." You can query for the classification structure for documents with the following transitive query from the document root.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel
WHERE {
?item wdt:P2* wd:Q5 . # subclass of, transitive "document"
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
You'll see here that we've laid out things like the classification of USGS Report Series as we work to bring in a representation for all USGS reports as one suite of document assets we often need to reference and link to from other entities.
USGS Series Reports
The "USGS Numbered Reports Series" are an important reference source in the GeoKB for many other things we are working on. We are building a process to keep a representation of USGS reports in sync from the USGS Publications Warehouse source, starting with a smaller subset important to current use cases. The the different USGS numbered series are part of a classification of documents. The following query uses that classification to pull all USGS reports and specific claims.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?itemAltLabel ?pub_year ?doi
?pw_index_id ?country ?countryLabel ?us_state ?us_stateLabel
?county ?countyLabel
WHERE {
?item wdt:P1/wdt:P2* wd:Q11 .
OPTIONAL {
?item wdt:P7 ?pub_year .
}
OPTIONAL {
?item wdt:P74 ?doi .
}
OPTIONAL {
?item wdt:P114 ?pw_index_id .
}
OPTIONAL {
?item wdt:P33 ?country .
}
OPTIONAL {
?item wdt:P34 ?us_state .
}
OPTIONAL {
?item wdt:P35 ?county .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
NI 43-101 Technical Reports
These are a special type of document that is part of an early use case for the GeoKB. We manage the metadata and stored document content for these reports in a Zotero collection as part of the GeoArchive, and create a representation of basic identification information in the GeoKB for knowledge graph purposes. The following query pulls 100 NI 43-101 Technical Report entities and a couple of pieces from their claims as an example.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?url ?addresses_place ?addresses_placeLabel
WHERE {
?item wdt:P1 wd:Q10 . # instance of NI 43-101 Technical Report
?item wdt:P31 ?url .
OPTIONAL {
?item wdt:P95 ?addresses_place .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100
Properties
Properties in the Wikibase instance are what connects subjects (entities) and objects (other objects or different kinds of values). We sometimes have the need to query for properties themselves and can do that via SPARQL. You can also browse the list of all properties.
SELECT ?property ?propertyLabel ?property_type WHERE {
?property a wikibase:Property .
?property wikibase:propertyType ?property_type .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Linkages
The whole point of what we're doing here in the GeoKB is to develop organized knowledge about the complex earth system and its interconnections as it touches on the domains of science covered by the USGS and our partners. As such, the links we develop between this system and other systems are perhaps the most vital component. We are not attempting to rebuild data, information, and knowledge already organized elsewhere, though we may develop a recasting-in-context here within the GeoKB in order to make better sense of things we link to in other contexts. This section discusses the specific system to system linkages we are developing as we continue pursuing use cases and analytical needs.
Mindat
An initial focus of the GeoKB is on use cases related to mineral resource assessment work in the USGS. While USGS is an authoritative source for certain types of geoscientific information related to our mission, there are better global authorities we can connect with to leverage the pieces of information we need to link with. Mindat is our source for information on the following:
- minerals
- rock names/classification
- commodities (mostly mineral related)
For the most part, we are only leveraging names (labels) and short descriptions along with the associated identifiers for these entities. We also incorporate relationships such as the classification of rocks. If we need additional details that Mindat provides, we can always go back to the source for more properties.
xDD (aka GeoDeepDive)
The xDD framework from collaborators at the University of Wisconsin-Madison provides a platform where we are processing collections of documents that you will find represented in the GeoKB. This includes a long-standing partnership that continually feeds USGS Series Publications into processing pipelines as well as newer collections coming from our GeoArchive work (e.g., NI 43-101 Technical Reports). xDD processing develops additional claims from both relatively mundane search indexing as well as more advanced entity recognition and AI assisted processing. The GeoKB becomes the place where entity recognition becomes entity linking as we confirm recognized entities as viable and record them as claims/links on the entities representing the associated documents. The gddid property houses the persistent, resolvable identifier used in doc_id and other queries with the xDD APIs. We are working on documenting sources associated with xDD and will get those refined to be associated as references for claims.
This query returns 100 items that have a GDDID. The formatter URL for the GDDID as an external identifier points to the xDD API end point for the "article" - a basic rendering of the citation metadata.
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
SELECT ?item ?itemLabel ?gddid
WHERE {
?item wdt:P93 ?gddid .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100
GeoKB Logistics/Specifications
Prefixes/Namespaces
SPARQL queries and work with RDF in general requires uses the concept of a short prefix string representing a URI to resolve an identifier. Prefix declaration in a query lets us use the short strings in place of constantly putting in a fully qualified and resolvable path. The wikibase.cloud folks have taken the stance that prefixes should always be explicitly declared as opposed to being in the default configuration for queries, which is good practice but a bit of a pain. Here is a rundown of the full set of prefixes for this Wikibase instance. You'll see the applicable prefixes from this exhaustive list in the query examples.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix wd: <https://geokb.wikibase.cloud/entity/> .
@prefix data: <https://geokb.wikibase.cloud/wiki/Special:EntityData/> .
@prefix s: <https://geokb.wikibase.cloud/entity/statement/> .
@prefix ref: <https://geokb.wikibase.cloud/reference/> .
@prefix v: <https://geokb.wikibase.cloud/value/> .
@prefix wdt: <https://geokb.wikibase.cloud/prop/direct/> .
@prefix wdtn: <https://geokb.wikibase.cloud/prop/direct-normalized/> .
@prefix p: <https://geokb.wikibase.cloud/prop/> .
@prefix ps: <https://geokb.wikibase.cloud/prop/statement/> .
@prefix psv: <https://geokb.wikibase.cloud/prop/statement/value/> .
@prefix psn: <https://geokb.wikibase.cloud/prop/statement/value-normalized/> .
@prefix pq: <https://geokb.wikibase.cloud/prop/qualifier/> .
@prefix pqv: <https://geokb.wikibase.cloud/prop/qualifier/value/> .
@prefix pqn: <https://geokb.wikibase.cloud/prop/qualifier/value-normalized/> .
@prefix pr: <https://geokb.wikibase.cloud/prop/reference/> .
@prefix prv: <https://geokb.wikibase.cloud/prop/reference/value/> .
@prefix prn: <https://geokb.wikibase.cloud/prop/reference/value-normalized/> .
@prefix wdno: <https://geokb.wikibase.cloud/prop/novalue/> .