1,477,004
edits
(→Projects: new section) |
No edit summary |
||
Line 5: | Line 5: | ||
In some ways, we do not need to have entities for all reports in the GeoKB itself as they can be referenced by persistent, resolvable identifiers with a certain degree of federation to a foreign system. However, we are using the GeoKB as a place to capture new information from a variety of sources associated directly with the publications. This content does not have anywhere else to live at the moment, and so we instantiate a very basic entity for each report here as a rallying point. We can then add claims derived directly from the content such as subjects derived through machine learning processes and derived linkages to other concepts in the GeoKB. | In some ways, we do not need to have entities for all reports in the GeoKB itself as they can be referenced by persistent, resolvable identifiers with a certain degree of federation to a foreign system. However, we are using the GeoKB as a place to capture new information from a variety of sources associated directly with the publications. This content does not have anywhere else to live at the moment, and so we instantiate a very basic entity for each report here as a rallying point. We can then add claims derived directly from the content such as subjects derived through machine learning processes and derived linkages to other concepts in the GeoKB. | ||
= Projects = | |||
== | The following section outlines specific project work we are doing within the GeoKB to improve the knowledge representation for USGS Series Reports. | ||
== Report Digital Assets == | |||
USGS Series Reports are made up of a variety of digital content. Depending on the type of publication, there may or may not be a core document in PDF format along with various ancillary files. The typical route to retrieve report materials is to go to the USGS Pubs Warehouse landing page (via DOI or other link) and then navigate to download or work with whatever is available. In the GeoKB, we need to bring some level of clarification to this situation such that we can route AI processes to the digital materials they can process and work with. | USGS Series Reports are made up of a variety of digital content. Depending on the type of publication, there may or may not be a core document in PDF format along with various ancillary files. The typical route to retrieve report materials is to go to the USGS Pubs Warehouse landing page (via DOI or other link) and then navigate to download or work with whatever is available. In the GeoKB, we need to bring some level of clarification to this situation such that we can route AI processes to the digital materials they can process and work with. | ||
We have access to a set of links from the Pubs Warehouse API that we'll work through first. I know from examination, that there are other special cases where the only link in the metadata is to some type of custom web page that will need to be parsed to work out what's behind it. The end goal here will be to have one or more statements for each publication that point directly to an HTTP-accessible location for digital assets. We'll use qualifiers to indicate the mime type or other details for the digital asset in the other end of the link. | We have access to a set of links from the Pubs Warehouse API that we'll work through first. I know from examination, that there are other special cases where the only link in the metadata is to some type of custom web page that will need to be parsed to work out what's behind it. The end goal here will be to have one or more statements for each publication that point directly to an HTTP-accessible location for digital assets. We'll use qualifiers to indicate the mime type or other details for the digital asset in the other end of the link. |