Item talk:Q164044: Difference between revisions

From geokb
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 9: Line 9:
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX ps: <https://geokb.wikibase.cloud/prop/statement/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>


SELECT ?report ?reportLabel ?publisher ?publisherLabel
SELECT ?report ?title ?publisher ?publisherLabel
(YEAR(?publication_date) AS ?date)
(YEAR(?publication_date) AS ?date)
?author_name
?author_name
Line 17: Line 18:
WHERE {
WHERE {
   ?report wdt:P1 wd:Q164044 ;
   ?report wdt:P1 wd:Q164044 ;
          wdt:P66 ?title ;
           wdt:P141 ?meta_url ;
           wdt:P141 ?meta_url ;
           wdt:P136 ?content_url ;
           wdt:P136 ?content_url ;
Line 27: Line 29:
     ?report wdt:P196 ?author_name .
     ?report wdt:P196 ?author_name .
   }
   }
   ?content_url_statement pq:P65 ?mime_type ;
   ?content_url_statement ps:P136 ?content_url ;
                        pq:P65 ?mime_type ;
                         pq:P197 ?checksum .
                         pq:P197 ?checksum .
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
Line 37: Line 40:
* Publisher values will either be a link to a GeoKB entity for the organization that is listed as publisher or the explicit unknown value type in Wikibase. The latter is for cases where we have not resolved the publisher to a defined entity and are unclear on metadata completeness and quality.
* Publisher values will either be a link to a GeoKB entity for the organization that is listed as publisher or the explicit unknown value type in Wikibase. The latter is for cases where we have not resolved the publisher to a defined entity and are unclear on metadata completeness and quality.
* The authors from this collection are all name only values that we may not formalize in the GeoKB. We used a separate type of string property to contain this information.
* The authors from this collection are all name only values that we may not formalize in the GeoKB. We used a separate type of string property to contain this information.
* Qualifiers for content URL include a MD5 checksum that comes from the original ScienceBase Item information.
* Qualifiers for content URL include a MD5 checksum that comes from the original ScienceBase Item information. Initially, some duplicate checksum values will be found. These represent a problem in the original content we are working to clean up at the source.
* The query does not do any grouping, so grouping may be necessary in further processing of this content. Items may have multiple publishers, authors, and content URLs.
* The query does not do any grouping, so grouping may be necessary in further processing of this content. Items may have multiple publishers, authors, and content URLs. Each unique GeoKB item corresponds to a unique ScienceBase Item, which should represent and contain an individual report.
* The meta URL values here, pointing to ScienceBase Items, are unique and should be reasonably persistent for the long term. They can be accessed as HTML as well as JSON via content negotiation (or "?format=json" added to the URL) to get full content.

Latest revision as of 23:43, 15 February 2024

Introduction

This class of government report represents an archive collection of reports from the U.S. Bureau of Mines (disbanded in 1996). The items that are instances of this class represent digital scans of the original reports with basic bibliographic metadata and pointers to the file download locations in the ScienceBase repository.

Query

The following query pulls most of the relevant details needed to build basic bibliographic metadata for Bureau of Mines reports and fetch the report content via URL.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX p: <https://geokb.wikibase.cloud/prop/>
PREFIX ps: <https://geokb.wikibase.cloud/prop/statement/>
PREFIX pq: <https://geokb.wikibase.cloud/prop/qualifier/>

SELECT ?report ?title ?publisher ?publisherLabel
(YEAR(?publication_date) AS ?date)
?author_name
?meta_url ?content_url ?mime_type ?checksum
WHERE {
  ?report wdt:P1 wd:Q164044 ;
          wdt:P66 ?title ;
          wdt:P141 ?meta_url ;
          wdt:P136 ?content_url ;
          wdt:P7 ?publication_date ;
          p:P136 ?content_url_statement .
  OPTIONAL {
    ?report wdt:P198 ?publisher .
  }
  OPTIONAL {
    ?report wdt:P196 ?author_name .
  }
  ?content_url_statement ps:P136 ?content_url ;
                         pq:P65 ?mime_type ;
                         pq:P197 ?checksum .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Notes

  • Publisher values will either be a link to a GeoKB entity for the organization that is listed as publisher or the explicit unknown value type in Wikibase. The latter is for cases where we have not resolved the publisher to a defined entity and are unclear on metadata completeness and quality.
  • The authors from this collection are all name only values that we may not formalize in the GeoKB. We used a separate type of string property to contain this information.
  • Qualifiers for content URL include a MD5 checksum that comes from the original ScienceBase Item information. Initially, some duplicate checksum values will be found. These represent a problem in the original content we are working to clean up at the source.
  • The query does not do any grouping, so grouping may be necessary in further processing of this content. Items may have multiple publishers, authors, and content URLs. Each unique GeoKB item corresponds to a unique ScienceBase Item, which should represent and contain an individual report.
  • The meta URL values here, pointing to ScienceBase Items, are unique and should be reasonably persistent for the long term. They can be accessed as HTML as well as JSON via content negotiation (or "?format=json" added to the URL) to get full content.