SPARQL examples

From geokb
Revision as of 19:26, 13 June 2023 by Sky (talk | contribs) (Added new section for linkages)

Entities

Entities ("Q" identifiers or items) in the GeoKB are the primary object in the knowledgebase. They are the focus for our knowledge organization scheme. The majority of properties ("P" identifiers) have an item type classification meaning that the object they connect to a subject entity is another entity in the knowledgebase. The majority of our entities are not "native" to this knowledgebase; rather they are sourced from some other data or information system with the entity in the knowledgebase a basic representation of the "foreign-sourced" item.

Entities are built iteratively over time with software codes used to introduce them to the knowledgebase. We may start an entity representation with only a couple of pieces of information, with the bare minimum being a label, description, and either an "instance of" or "subclass of" claim, depending on the purpose of the entity in the knowledgebase. We also describe source material as entities classified as "knowledgebase source." Claims (or statements) made about an entity should always have at least one reference that essentially cites the source for the statement. These will often point to a "knowledgebase source" item or may use "reference URL" when the reference source is simply a single linkable online resource. Over time, we may revisit the initial starter code that introduced an entity to the knowledgebase to change how it operates and bring in further claims. We may also change the nature of how an entity specifies its source by changing of adding to references.

This page serves as iterative documentation of the entities in the knowledgebase with a focus toward how they can be retrieved and interacted with in use through SPARQL queries. It is organized into sections describing a few things about the major entity types. Each section may contain a link to an associated "item talk" page in this Wikibase instance that discusses the evolution of the entity. Each section will also contain a link to the associated ShExC schema for the entity type as those are developed iteratively. Schemas facilitate interactions with the knowledgebase such as ongoing multimodal introduction of new entity instances and claims.

Minerals Related Entities

The following section provides common queries for items in the GeoKB related to mineral resources, one of the principle uses of the knowledgebase.

Mineral Commodity

The following query pulls items classified as mineral commodity and associated code values from the USGS Mineral Resources Data System and/or Mindat. Not all codes are in place for the items and Mindat codes will be changing to the long-form identifier resolvable at mindat.org.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>

SELECT ?item ?itemLabel ?mrds_code ?mindat_id
WHERE {
  ?item wdt:P1 wd:Q406 .
  ?item p:P1 ?instance_statement .
  OPTIONAL {
    ?instance_statement pq:P19 ?mrds_code .
  }
  OPTIONAL {
    ?instance_statement pq:P99 ?mindat_id .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Mines

This query searches for the first 100 items representing mines with their names, identifiers, and point coordinates (which are a mappable WKT point that can be pulled into a mapping application).

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?mine ?mineLabel ?coordinate_location
WHERE {
  ?mine wdt:P1 wd:Q3646 .
  ?mine wdt:P6 ?coordinate_location .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100

Try it!


Mining Facility

As we continue to evolve the model for how we organize information and knowledge about mining in the GeoKB, we are honing in on the need to have specific mining facilities classified and organized as top-level entities that can then be associated with a specific "mining project" as a higher level concept. We established mining facility as a subclass of a more general facility and then a set of more specific subclasses from the USMIN topographic mine symbol digitization project.

There is a class of mining facility for mine, which is what we used to classify all of the mine feature classes we pulled from GNIS. We will either continue to use that as a specific type of mining facility and use "mining project," "mining prospect," or something else as the higher level container or else elevate and reclassify "mine" to mean something else.

The following query returns the mining facility classification.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P2* wd:Q44143 . # subclass of (transitive) mining facility
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Rock Classification

We've started the GeoKB understanding of rock classification via the Mindat system. We will likely augment this over time with other interpretations and classification systems, but since we are pulling minerals from Mindat and want to link to rock types included in those records, starting with Mindat made some sense. The following queries show a bit of how to work with the classification itself in addition to what we link to specific classes.

Igneous Rocks

The following query starts with igneous rock and pulls the full classification from that point (* on the end of the predicate). Because we pull identifiers in this, you can use something like the graph view in Wikibase to visualize and explore the items through their connections.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?rock ?rockLabel ?subclass_of
WHERE {
  ?rock wdt:P2* wd:Q41459 .
  ?rock wdt:P2 ?subclass_of .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Mindat Identifiers

We are using Mindat as a key reference for a number of things (rocks, minerals, etc.). The Mindat identifier provides the linkage back to Mindat for gathering additional information on the subject items. Mindat IDs are incorporates as a qualifier on the instance of statements made for items sourced from the Mindat API. The following query is one example of how to return rock items with their Mindat IDs.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/> 

SELECT ?item ?itemLabel ?mindat_id
WHERE {
  ?item wdt:P1 wd:Q41261 . # "instance of" "rock"
  OPTIONAL {
    ?item p:P1 ?statement . # Get the instance of statement to operate on
    OPTIONAL { ?statement pq:P99 ?mindat_id . } # Pull the Mindat ID qualifier as a value
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 10

Try it!


Mineral-related Items Classified Multiple Ways

We've made a design decision in the GeoKB with items that we refer to in common usage to only declare or instantiate these entities with one, uniquely identified item that is then classified and characterized to indicate the different ways the concept can be used. An example of this are items that can be a mineral, a mineral commodity, and a chemical element. This is essentially dealing with the issue of the same word or phrase meaning different things in different contexts. The alternative would be to declare separate entities, each with their own specific classification and other characteristics, and then use relationships between the different items or disambiguation features in the knowledgebase to distinguish between them. We will have to determine exactly which approach makes the most sense in practical use over time.

The following query searches for items that are classified as chemical element, mineral, and mineral commodity in the same logical labeled entity.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?commodity ?commodityLabel ?instance_ofLabel
WHERE {
  ?commodity wdt:P1 wd:Q406;
             wdt:P1 wd:Q280;
             wdt:P1 wd:Q24 .
  ?commodity wdt:P1 ?instance_of .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Organizations

Entities representing organizations are important in the GeoKB in a couple of areas. We have information on entities such as publications and projects connected to USGS "sub-organizations" such as Science Centers and Labs. We also have information associated with external organizations such as mining companies used to retrieve and organize prospecting history for mineral resource assessments.

The following query retrieves information on organization entities that are characterized as "part of" the U.S. Geological Survey. Because we often need to lookup an organization based on imperfect identifiers like name and url variations, this query may return multiple records for each entity with different URLs and aliases. This allows us to efficiently figure out if we already have a label or URL variant included for linking purposes. (This is still a work in progress as we have an incomplete record of USGS organizations of interest and are working to harmonize across several sources.)

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?item_alt_label ?url ?instance_ofLabel
WHERE {
  ?item wdt:P62* wd:Q44210 . # "part of" (transitive) "USGS"
  ?item wdt:P1 ?instance_of . # We're only going to look at entities that have an instance of classification
  OPTIONAL {
    ?item skos:altLabel ?item_alt_label . # Get the aliases this way to avoid issues with alternate labels containing commas
    FILTER (lang(?item_alt_label)='en')
  }
  OPTIONAL {
    ?item wdt:P31 ?url . # Get all reference URLs when available
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


People

Items representing people associated with the USGS are another type of entity built out in the GeoKB. We use these as reference points and connections to the overall scientific record captured in this knowledge graph. Person records come from public sources such as our USGS Staff Profiles and are further discussed on the person classification talk page.

People by employer

This query pulls all people along with their email address (already publicly visible) and reference URL (pointer to USGS staff profile).

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?email ?profile_url ?orcid
WHERE {
  ?item wdt:P1 wd:Q3 .
  ?item wdt:P107 wd:Q44210 .
  OPTIONAL {
    ?item wdt:P109 ?email .
  }
  OPTIONAL {
    ?item wdt:P31 ?profile_url .
  }
  OPTIONAL {
    ?item wdt:P106 ?orcid .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10000

Try it!


Places

The GeoKB includes references for many named places that are necessary links from many other items. The following are some examples for this part of the knowledgebase.

U.S. States and Territories

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?fips_alpha
WHERE {
  ?item wdt:P13 ?fips_alpha .  # Both states and territories in the U.S. have two-character FIPS codes
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


U.S. Counties or Equivalent Subdivision

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?fips_code ?coordinates
WHERE {
  ?item wdt:P1 wd:Q481 . # "instance of" "county or equivalent"
  ?item wdt:P34 wd:Q256 . # in the state of Colorado
  ?item wdt:P22 ?fips_code .
  ?item wdt:P6 ?coordinates .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Documents

Another fundamental entity type we are bringing together in the GeoKB are representations of documents of one kind or another. For right now, we put "document" right at the top of the classification as a foundational type of "entity." You can query for the classification structure for documents with the following transitive query from the document root.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P2* wd:Q5 . # subclass of, transitive "document"
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


You'll see here that we've laid out things like the classification of USGS Report Series as we work to bring in a representation for all USGS reports as one suite of document assets we often need to reference and link to from other entities.

USGS Series Reports

The "USGS Numbered Reports Series" are an important reference source in the GeoKB for many other things we are working on. We are building a process to keep a representation of USGS reports in sync from the USGS Publications Warehouse source, starting with a smaller subset important to current use cases. The the different USGS numbered series are part of a classification of documents. The following query uses that classification to pull all USGS reports and specific claims.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?itemAltLabel ?pub_year ?doi 
?pw_index_id ?country ?countryLabel ?us_state ?us_stateLabel
?county ?countyLabel
WHERE {
  ?item wdt:P1/wdt:P2* wd:Q11 .
  OPTIONAL {
    ?item wdt:P7 ?pub_year .
  }
  OPTIONAL {
    ?item wdt:P74 ?doi .
  }
  OPTIONAL {
    ?item wdt:P114 ?pw_index_id .
  }
  OPTIONAL {
    ?item wdt:P33 ?country .
  }
  OPTIONAL {
    ?item wdt:P34 ?us_state .
  }
  OPTIONAL {
    ?item wdt:P35 ?county .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


NI 43-101 Technical Reports

These are a special type of document that is part of an early use case for the GeoKB. We manage the metadata and stored document content for these reports in a Zotero collection as part of the GeoArchive, and create a representation of basic identification information in the GeoKB for knowledge graph purposes. The following query pulls 100 NI 43-101 Technical Report entities and a couple of pieces from their claims as an example.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?url ?addresses_place ?addresses_placeLabel
WHERE {
  ?item wdt:P1 wd:Q10 . # instance of NI 43-101 Technical Report
  ?item wdt:P31 ?url .
  OPTIONAL {
    ?item wdt:P95 ?addresses_place .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100

Try it!


Properties

Properties in the Wikibase instance are what connects subjects (entities) and objects (other objects or different kinds of values). We sometimes have the need to query for properties themselves and can do that via SPARQL. You can also browse the list of all properties.

SELECT ?property ?propertyLabel ?property_type WHERE {
  ?property a wikibase:Property .
  ?property wikibase:propertyType ?property_type .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Linkages

The whole point of what we're doing here in the GeoKB is to develop organized knowledge about the complex earth system and its interconnections as it touches on the domains of science covered by the USGS and our partners. As such, the links we develop between this system and other systems are perhaps the most vital component. We are not attempting to rebuild data, information, and knowledge already organized elsewhere, though we may develop a recasting-in-context here within the GeoKB in order to make better sense of things we link to in other contexts. This section discusses the specific system to system linkages we are developing as we continue pursuing use cases and analytical needs.

xDD (aka GeoDeepDive)

The xDD framework from collaborators at the University of Wisconsin-Madison provides a platform where we are processing collections of documents that you will find represented in the GeoKB. This includes a long-standing partnership that continually feeds USGS Series Publications into processing pipelines as well as newer collections coming from our GeoArchive work (e.g., NI 43-101 Technical Reports). xDD processing develops additional claims from both relatively mundane search indexing as well as more advanced entity recognition and AI assisted processing. The GeoKB becomes the place where entity recognition becomes entity linking as we confirm recognized entities as viable and record them as claims/links on the entities representing the associated documents. The gddid property houses the persistent, resolvable identifier used in doc_id and other queries with the xDD APIs. We are working on documenting sources associated with xDD and will get those refined to be associated as references for claims.

This query returns 100 items that have a GDDID. The formatter URL for the GDDID as an external identifier points to the xDD API end point for the "article" - a basic rendering of the citation metadata.

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?gddid
WHERE {
  ?item wdt:P93 ?gddid .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100

Try it!


GeoKB Logistics/Specifications

Prefixes/Namespaces

SPARQL queries and work with RDF in general requires uses the concept of a short prefix string representing a URI to resolve an identifier. Prefix declaration in a query lets us use the short strings in place of constantly putting in a fully qualified and resolvable path. The wikibase.cloud folks have taken the stance that prefixes should always be explicitly declared as opposed to being in the default configuration for queries, which is good practice but a bit of a pain. Here is a rundown of the full set of prefixes for this Wikibase instance. You'll see the applicable prefixes from this exhaustive list in the query examples.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix wd: <https://geokb.wikibase.cloud/entity/> .
@prefix data: <https://geokb.wikibase.cloud/wiki/Special:EntityData/> .
@prefix s: <https://geokb.wikibase.cloud/entity/statement/> .
@prefix ref: <https://geokb.wikibase.cloud/reference/> .
@prefix v: <https://geokb.wikibase.cloud/value/> .
@prefix wdt: <https://geokb.wikibase.cloud/prop/direct/> .
@prefix wdtn: <https://geokb.wikibase.cloud/prop/direct-normalized/> .
@prefix p: <https://geokb.wikibase.cloud/prop/> .
@prefix ps: <https://geokb.wikibase.cloud/prop/statement/> .
@prefix psv: <https://geokb.wikibase.cloud/prop/statement/value/> .
@prefix psn: <https://geokb.wikibase.cloud/prop/statement/value-normalized/> .
@prefix pq: <https://geokb.wikibase.cloud/prop/qualifier/> .
@prefix pqv: <https://geokb.wikibase.cloud/prop/qualifier/value/> .
@prefix pqn: <https://geokb.wikibase.cloud/prop/qualifier/value-normalized/> .
@prefix pr: <https://geokb.wikibase.cloud/prop/reference/> .
@prefix prv: <https://geokb.wikibase.cloud/prop/reference/value/> .
@prefix prn: <https://geokb.wikibase.cloud/prop/reference/value-normalized/> .
@prefix wdno: <https://geokb.wikibase.cloud/prop/novalue/> .