Item talk:Q44143: Difference between revisions

From geokb
No edit summary
 
(One intermediate revision by the same user not shown)
Line 18: Line 18:


We ran an initial processing of "mining features" from the Geographic Names Information System (GNIS) to introduce new items into the GeoKB. This introduced potentially useful items and especially an important identifier we will find in other sources. However, it also introduced imperfectly described and characterized mining related concepts. Having these all classified with the simplicity of being an instance of a "mine" means the utility of the item is limited. The work in the USMIN project to capture mining facilities with a more detailed type classification along with more accurate/specific geospatial location information will be a useful integration into the GeoKB, starting with the imperfectly described GNIS items.
We ran an initial processing of "mining features" from the Geographic Names Information System (GNIS) to introduce new items into the GeoKB. This introduced potentially useful items and especially an important identifier we will find in other sources. However, it also introduced imperfectly described and characterized mining related concepts. Having these all classified with the simplicity of being an instance of a "mine" means the utility of the item is limited. The work in the USMIN project to capture mining facilities with a more detailed type classification along with more accurate/specific geospatial location information will be a useful integration into the GeoKB, starting with the imperfectly described GNIS items.
On the MRDS site, we need to do some different kinds of work here, and we'll be pulling in a whole other slew of interesting information as claims. For one thing, we need to examine the type classification used in MRDS against a more specific classification we are trying to handle in the GeoKB. For instance, there are "sites" in MRDS, which appear to be company names. While there likely are mining operations that are simply known as the name of a company that operated the mine at one time, we need to work this out. A given item can be classified in multiple ways (instance of claims). It may or may not be appropriate to mix something that is inherently a corporate entity with a particular project operated by that entity; we'll have to see how that works out in practice.


== SPARQL Query ==
== SPARQL Query ==
Line 29: Line 31:
PREFIX pr: <https://geokb.wikibase.cloud/prop/reference/>
PREFIX pr: <https://geokb.wikibase.cloud/prop/reference/>


SELECT ?mine ?mineLabel ?loc_typeLabel ?locationLabel ?locationAltLabel
SELECT ?mine ?mineLabel ?loc_typeLabel ?locationAltLabel
WHERE {
WHERE {
   ?mine wdt:P1 wd:Q3646 .
   ?mine wdt:P1 wd:Q3646 .
Line 43: Line 45:
</sparql>
</sparql>


== Alignment of existing items with USMIN ==
== Reconciling USMIN features with GeoKB mines ==
 
The reconciliation service we are experimenting with (an out of the box capability with Wikibase) will take a starting point like P1 (instance of) = Q3624 (mine) and attempt to match a column of names (Ftr_Name) along with other columns for US State and County to reconcile features with existing items. When executed from within OpenRefine, this process will attempt to confirm more certain matches while leaving it to operator judgment on confirming or rejecting other suggestions.
 
Another option is to go a bit deeper on the geospatial information and check for proximity between the GNIS points and the geometry in USMIN. This might be a method to pick up further cases where we do not have a connection on name for any number of reasons.
 
== What to use from USMIN ==
 
Assuming we can reconcile USMIN records to records for mining prospects/projects/properties already in the GeoKB, we need to work out what we want to leverage and how it should be encoded in the knowledgebase model. In a basic matching process run with code, I found a small number (3,525) of USMIN features linked to GeoKB records started with GNIS. I mocked up one approach on how we might go about encoding the potentially most relevant bits of information in an item for the "[https://geokb.wikibase.cloud/wiki/Item:Q5414 Northernmost Mines]" in Alabama. The main thing we get from USMIN is a feature type tied to a point coordinate or a polygon. The example shows adding two additional point coordinates that should be for specific features/structures associated with the mine. I added the feature type as an "instance of" qualifier (for now, but we may need to rework the semantics) along with the year date from the topo map from which the two adits, in this case, were derived. I referenced an item for the particular version of USMIN used as the data source.
 
With this, we have an interesting working example of an additive knowledge building process through time recorded in a way that carries a provenance trace through to show where claims information are coming from. Whether or not this adds any real value yet remains to be tested. All three points here are really close to each other, so adding in the two USMIN-derived point features likely doesn't do anything much for our core mineral resource assessment use cases. But there will be other cases where integrating multiple sources of information in this way will bring new value.


One thing we need to do before we can execute further operations in the GeoKB with USMIN is establish a connection between existing items we brought in and USMIN feature records. (I use "feature" here because USMIN is inherently in the context of a GIS database where "geographic feature" is the term of art describing its basic entities.)
We also need to work through what happens when USMIN (and other sources) seems to indicate a completely new mine that the GeoKB doesn't yet know about. This is where an OpenRefine approach may prove useful in recording those judgments made by a person, but we will also need to look into how bots can record their "new item" actions with appropriate annotation.

Latest revision as of 23:20, 24 April 2023

Mining Facility in SWEET

One of the things I'm experimenting with is the alignment of concepts we are bringing into the GeoKB with the SWEET ontology. SWEET is a formal classification system originally developed by Bob Raskin and others at NASA and now maintained as a community resource via ESIP. It provides a broad base of describing the world that could be a useful rallying point for us to tie in with through this work, both in terms of pulling in related concepts we need to link with and in contributing from our work here.

This particular concept of Mining Facility ties in reasonably well with the Structure > Facility > Mining Facility set of concepts in SWEET, though we might differ a bit in how it classifies above that level. It is useful to work what we are doing into SWEET in that we may want to tie in with other types of facilities from other domains also rallying around this community semantic resource.

Mining Classification

One of the things we need to work out is how the concepts we need in the GeoKB related to mines/mining facilities are mapped in the GeoKB and related to formal ontologies. As we look to incorporate information about mines from the USMIN work and MRDS, we will need to develop some further linked concepts here in the GeoKB that can accommodate data/information in those systems.

A key aspect of this is the Ftr_Type attribute in the USMIN dataset of prospects mine-related features. The metadata lists the domain values for this key classification property with definitions. Most of them point to the reference, American Geological Institute, 1997, Dictionary of mining, mineral, and related terms, 2nd Ed. Some definitions for these terms include notes about the way USMIN producers aligned features pulled from topographic map symbols/annotation. There are also several domain values for Ftr_Type that USMIN authors introduced themselves with definitions included.

The AGI source does not represent a fully qualified and organized classification system, but is rather a vocabulary source. I did find a few cases of "mineral exploration ontologies" being developed to support semantic reasoning experiments, but I have not yet found a mature resource to operate against for GeoKB purposes. In the near term, incorporating the most mature USGS source (USMIN) seems a reasonable course.

We may start with simply pulling in the Ftr_Type domain values as stated, organizing them simply as subclasses of mining facilities for the time being. Some of these are more specific classifications than the very simplistic "mine" features that we pulled in from the GNIS source and now need to refine within the GeoKB.

Semantic Clarification - MRDS and GNIS

We ran an initial processing of "mining features" from the Geographic Names Information System (GNIS) to introduce new items into the GeoKB. This introduced potentially useful items and especially an important identifier we will find in other sources. However, it also introduced imperfectly described and characterized mining related concepts. Having these all classified with the simplicity of being an instance of a "mine" means the utility of the item is limited. The work in the USMIN project to capture mining facilities with a more detailed type classification along with more accurate/specific geospatial location information will be a useful integration into the GeoKB, starting with the imperfectly described GNIS items.

On the MRDS site, we need to do some different kinds of work here, and we'll be pulling in a whole other slew of interesting information as claims. For one thing, we need to examine the type classification used in MRDS against a more specific classification we are trying to handle in the GeoKB. For instance, there are "sites" in MRDS, which appear to be company names. While there likely are mining operations that are simply known as the name of a company that operated the mine at one time, we need to work this out. A given item can be classified in multiple ways (instance of claims). It may or may not be appropriate to mix something that is inherently a corporate entity with a particular project operated by that entity; we'll have to see how that works out in practice.

SPARQL Query

To pull the items we want to use, we need to look at a specific query that will not only get "mine" based on instance of claims but also those that used the GNIS as their reference. The following query will pull items we want to operate against with USMIN integration/improvement workflow.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX pr: <https://geokb.wikibase.cloud/prop/reference/>

SELECT ?mine ?mineLabel ?loc_typeLabel ?locationAltLabel
WHERE {
  ?mine wdt:P1 wd:Q3646 .
  ?mine p:P1 ?instance_of_statement .
  ?instance_of_statement prov:wasDerivedFrom ?ref .
  ?ref pr:P3 wd:Q3624 .
  ?mine wdt:P11 ?location .
  VALUES ?location_type { wd:Q229 wd:Q481 }
  ?location wdt:P1 ?location_type .
  ?location wdt:P1 ?loc_type .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Reconciling USMIN features with GeoKB mines

The reconciliation service we are experimenting with (an out of the box capability with Wikibase) will take a starting point like P1 (instance of) = Q3624 (mine) and attempt to match a column of names (Ftr_Name) along with other columns for US State and County to reconcile features with existing items. When executed from within OpenRefine, this process will attempt to confirm more certain matches while leaving it to operator judgment on confirming or rejecting other suggestions.

Another option is to go a bit deeper on the geospatial information and check for proximity between the GNIS points and the geometry in USMIN. This might be a method to pick up further cases where we do not have a connection on name for any number of reasons.

What to use from USMIN

Assuming we can reconcile USMIN records to records for mining prospects/projects/properties already in the GeoKB, we need to work out what we want to leverage and how it should be encoded in the knowledgebase model. In a basic matching process run with code, I found a small number (3,525) of USMIN features linked to GeoKB records started with GNIS. I mocked up one approach on how we might go about encoding the potentially most relevant bits of information in an item for the "Northernmost Mines" in Alabama. The main thing we get from USMIN is a feature type tied to a point coordinate or a polygon. The example shows adding two additional point coordinates that should be for specific features/structures associated with the mine. I added the feature type as an "instance of" qualifier (for now, but we may need to rework the semantics) along with the year date from the topo map from which the two adits, in this case, were derived. I referenced an item for the particular version of USMIN used as the data source.

With this, we have an interesting working example of an additive knowledge building process through time recorded in a way that carries a provenance trace through to show where claims information are coming from. Whether or not this adds any real value yet remains to be tested. All three points here are really close to each other, so adding in the two USMIN-derived point features likely doesn't do anything much for our core mineral resource assessment use cases. But there will be other cases where integrating multiple sources of information in this way will bring new value.

We also need to work through what happens when USMIN (and other sources) seems to indicate a completely new mine that the GeoKB doesn't yet know about. This is where an OpenRefine approach may prove useful in recording those judgments made by a person, but we will also need to look into how bots can record their "new item" actions with appropriate annotation.