Item talk:Q158700

From geokb

USGS has a project to develop a database of Geological Heritage Sites, and we are experimenting with how these entities can be represented in knowledge graph form in the GeoKB. The current data are in a geodatabase form with an ArcGIS Online service. Building the knowledge representation forces us to exercise a little more discipline than what is essentially a standalone dataset because we are integrating the concept of a "geoheritage site" into a larger construct that has many other types of entities represented.

Classification

A geoheritage site is a new type of thing that needs some consideration in how it will be classified and characterized. We are essentially talking about a type of geographic feature where the base class is essentially a spatio-temporal entity. Following a guiding principle of considering Wikidata as a global knowledge commons we want to contribute to, we've followed the Wikidata classification scheme to some extent as a way to organize these sites. The classes of object, heritage, natural heritage, and geoheritage all have same as claims pointing to Wikidata entities that match well enough to GeoKB semantics. Wikidata also has an entity for IUGS Geological Heritage Sites, though this is not actually fleshed out in Wikidata as yet, which follows this same basic classification scheme.

   * object
       * heritage
           * natural heritage
               * geoheritage
                   * USGS Geoheritage Site

It is highly likely that the entities we establish initially as instances of USGS Geoheritage Site will also be classified in other ways (e.g., glacier, volcano, etc.), which are also classes we need to build out in the GeoKB as we progress. We may do that in the near term or later. One thing that may push this along is alignment with the Geographic Names Information System that has its own set of feature classes we need to keep working through beyond "mines" that we started with in another use case. We will look to the work the USGS Center for Geospatial Information Science has been doing on their own linked data representation.

Location/Context

One of the attributes being used in the Geoheritage data model is the location context for the feature such as a National Park or National Forest. This was a good prompt to go in and continue fleshing out reference data in the GeoKB to introduce all managed units for the National Park Service, Forest Service, etc. (work in progress). Then the question becomes, what property do we use to link Geoheritage Sites to these units. For at least some cases, Wikidata uses a special property, located in protected area, for this purpose.

This is something I've wrestled with in the GeoKB: do we go the same route and use many different highly specialized properties that constrain the meaning/significance of a linkage to something very specific, or do work with higher level predicates such as "located in" and let the nature of the object linked to the subject provide the semantics of the linkage. I have already followed Wikidata in using the "located in the administrative territorial entity" property for linking subjects to political boundaries like states or counties, so we've been experimenting with both apporoaches. However, we need to nail this down fairly soon so as not to introduce undue confusion as the system gets increased live use.

My inclination is to start simplifying the property space to some extent, though this needs further discussion. In practical use, many people use data models for tabular datasets like spreadsheets and geodatabases where they have even more specialized properties like "state" and "county" and would likely not resonate with a generalized catch-all property name like "located in." When building a tabular representation from the graph, however, we can use classification and other characteristics of objects to produce a more familiar representation.

Geospatial

The geodatabase form of the USGS Geoheritage data has reasonably precise geometry for the features themselves beyond the basic context of a National Park or other protected area. These could be point representations or polygon/multi-polygon features. The GeoKB structure can accommodate point representations (one or many) and this will be useful to include in the knowledge representation. More complex geospatial data would be best handled in a different representation, but we do ultimately need to think about something like the Wikidata approach using Wikimedia Commons and GeoJSON.

Linkage to Wikidata

Many of the Geoheritage Sites we have so far have representation in Wikidata.

Other Characteristics