Item talk:Q11

From geokb

USGS Series Reports are a longstanding suite of government scientific reports published by the USGS going back to our founding. They include highly scientific and technical documents as well as things designed more for general scientific communication. All Reports undergo peer review and a Bureau approval process and are considered products of the institution backing their individual authors.

USGS Series Reports are an important asset to be represented in the GeoKB as we need to point to them as source material behind many claims throughout the knowledgebase. The Reports are provided through the USGS Publications Warehouse via a web interface and API. Most reports are assigned a DOI in the USGS' 10.3133 CrossRef identifier space. Landing pages for reports contain some degree of structured data (embedded machine-readable metadata), but the technology used there is out of date and does not support modern linked open data standards and methods.

In some ways, we do not need to have entities for all reports in the GeoKB itself as they can be referenced by persistent, resolvable identifiers with a certain degree of federation to a foreign system. However, we are using the GeoKB as a place to capture new information from a variety of sources associated directly with the publications. This content does not have anywhere else to live at the moment, and so we instantiate a very basic entity for each report here as a rallying point. We can then add claims derived directly from the content such as subjects derived through machine learning processes and derived linkages to other concepts in the GeoKB.


The following section outlines specific project work we are doing within the GeoKB to improve the knowledge representation for USGS Series Reports.

Report Digital Assets

USGS Series Reports are made up of a variety of digital content. Depending on the type of publication, there may or may not be a core document in PDF format along with various ancillary files. The typical route to retrieve report materials is to go to the USGS Pubs Warehouse landing page (via DOI or other link) and then navigate to download or work with whatever is available. In the GeoKB, we need to bring some level of clarification to this situation such that we can route AI processes to the digital materials they can process and work with.

We have access to a set of links from the Pubs Warehouse API that we'll work through first. I know from examination, that there are other special cases where the only link in the metadata is to some type of custom web page that will need to be parsed to work out what's behind it. The end goal here will be to have one or more statements for each publication that point directly to an HTTP-accessible location for digital assets. We'll use qualifiers to indicate the mime type or other details for the digital asset in the other end of the link.