Item talk:Q50862: Difference between revisions

From geokb
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
For the GeoKB, we need some representation of USGS organizational units to link to from associated information. There are no fully comprehensive or absolutely correct representations of the USGS organizational structure online, but the USGS Web does provide how the organization is portrayed to the rest of the world. In keeping with the constraint of only processing and organizing publicly available information, the GeoKB organizes the Web-based representation of USGS organizational units with a little bit of liberty taken on classification in order for the structure to make the most sense within the knowledgebase context.
= USGS Organizations =
In the GeoKB, we have organized items representing USGS organizational units in the best way possible based on existing public information on the USGS web. The web sites for USGS list the following major organization types:
* Mission Areas
* Programs
* Regions
* Science Centers


The following query starts at the classification of "USGS organization" (this item) and shows the subclasses built to provide instance of claims on specific organizational units.
The USGS web also has listings for two additional types of "sub-organizations" (Laboratories and Observatories) that are not necessarily fully complete. As with many organizations, USGS organizational units are fluid through time and do not always have persistent, resolvable identifiers that stand the test of time. Older references to organization names may no longer be found online in a reliable form. In the GeoKB, we are attempting to provide a platform where this dynamic can be dealt with and recorded in a way that serves as an enduring reference and an indication of where things change.
 
== Classification ==
The concept of organization aligns with several different standard ontologies, including FOAF and schema.org. In the GeoKB, we place [[Item:Q50861|government organization]] as a specific subclass of [[Item:Q4|organization]] and then another subclass of [[Item:Q50862|USGS organization]]. From the USGS organization item, we place subclasses that align with the USGS web public presentation of the USGS organizational structure along with some additional interpretation to help reflect the organizations as presented. The following query assembles the entire classification graph used for USGS organizations in the GeoKB:


<sparql tryit="1">
<sparql tryit="1">
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
SELECT ?item ?itemLabel ?subclass_of ?subclass_ofLabel
WHERE {
  {
    wd:Q50862 wdt:P2* ?item .
    ?item wdt:P2 ?subclass_of .
  } UNION {
    ?item wdt:P2* wd:Q50862 ;
          wdt:P2 ?subclass_of .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
</sparql>
== Names and Identifiers ==
Organization names change through time. Sometimes, these are particularly significant changes that reflect a fundamental change in organizational structure, form, and function. In other cases, these changes reflect something less substantive where alternate names are essentially a simpler change in identifier. In the former case, we attempt to reflect this dynamic with a new entity in the GeoKB and a relationship to a former entity. In the latter case, we simply use alternate labels that allow an item to be discovered or referred to using a former name that may still be in use in some cases.
There is no single persistent identifier system for all USGS organizations. While there are internal codes used in business management systems, these do not serve the purpose of uniquely identifying an organizational unit through time. Some USGS organizational units do have identifiers such as DUNS numbers as granting institutions or Research Organization Registry identifiers.
The following query retrieves items with ROR identifiers, which may prove over time to be a reasonable approach for USGS to follow in persistently identifying its organizational units through an external resolver system.
<sparql tryit="1">
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>


SELECT ?item ?itemLabel ?itemDescription ?subclass_of ?subclass_ofLabel
SELECT ?item ?itemLabel ?ror ?ror_url
WHERE {
WHERE {
   ?item wdt:P2* wd:Q50862 . # subclass of "USGS organization"
   ?item wdt:P193 ?ror .
  ?item wdt:P2 ?subclass_of .
  BIND (CONCAT("https://ror.org/", STR(?ror)) AS ?ror_url)
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
}
</sparql>
</sparql>


The liberties taken here in classification attempt to provide some slightly deeper sense of what the different kinds of organizational units are, building a bit on the USGS Web attempt to show things like "laboratories" and "observatories" beyond the basic designation of "Science Center," which has more to do with budgetary organizational dynamics. We'll work with this over time in practice as we discover the best ways of linking everything together in the graph.
== All USGS Organizations ==
The GeoKB can prove a useful tool in reconciling USGS organization names to other identifiers. Though non-persistent and unstable through time, URLs for organizations could prove useful in some cases and are included in the GeoKB as reference URL claims (P31). The GeoKB can be used as a reconciliation service for something like a spreadsheet of records processed through OpenRefine. The following query will return all USGS organizations, including alternate labels (which includes things like acronyms), in a format that can be suitable for lookup and name resolution purposes. Since some organizations have multiple URLs listed for official website, these are grouped into a list.
 
<sparql tryit=1>
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
 
SELECT ?item ?itemLabel ?item_alt_label ?instance_ofLabel
(GROUP_CONCAT(?url; separator=",") AS ?urls)
WHERE {
  ?org_types wdt:P2 wd:Q50862 . # Gets subclasses of USGS organization
  ?item wdt:P1 ?org_types ; # Gets items in those classes
        wdt:P1 ?instance_of . # Gets the individual instance of classification
  OPTIONAL {
    ?item skos:altLabel ?item_alt_label . # Retrieves alternate labels into separate rows
    FILTER (lang(?item_alt_label)='en')
  }
  OPTIONAL {
    ?item wdt:P145 ?url . # Get all reference URLs when available
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?item ?itemLabel ?item_alt_label ?instance_ofLabel
</sparql>


== Temporality of Organizations ==
== Organizational Hierarchy ==
The organizational structure in USGS changes through time, from the organization of Programs (appropriated funding line items) to Regions and Science Centers. The Wikibase structure offers an opportunity to organize this changing structure with respect to time-based qualifiers that indicate when specific organizational dynamics are or were current. Things like the Mission Areas under which Programs are organized or the Regions in which Science Centers are defined change through time. Currently, we are employing "point in time" qualifiers to indicate the year that a claim is/was considered to be true. We are not attempting to "recreate history" with this approach but will use it as a point-forward way of keeping some track of organizational change.
In the GeoKB, we use the inverse properties [[Property:P189|has subsidiary]] and [[Property:P190|is subsidiary of]] to describe the relationships between organizational entities like those in the USGS that are organized into a hierarchy. These can be used with SPARQL to construct a graph.


== Identifier Challenges ==
<sparql tryit="1">
There is no comprehensive identifier for USGS organizational units. There are internal identifiers used in several circumstances, but these apply to business logic such as budget or IT management and are not very useful as persistent identifiers and are certainly not resolvable technically or publicly. Some USGS organizational units such as Science Centers who put out funding opportunities have DUNS numbers, which provide a certain type of identification for funding agents. Some of these also have CrossRef organizational identifiers (DOIs used to provide tracking from scientific publications back to funding sources). A handful of USGS organizations have a newer Research Organization Registry (ROR) identifier, which is perhaps the most promising external identifier system for the future.
PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>


At the current time, however, we have only label-based identification mechanisms in place with different information systems using their own unique source for identifying organizational units. What we can do from the GeoKB perspective is record the different labels used as aliases and build up aggregate records for organizational units over time as we represent the records from source systems in the knowledgebase. This is fraught with some challenges, of course, but it's the best we can do until we move toward some type of comprehensive mechanism for persistent, resolvable identifiers that respect and record organizational changes through time.
SELECT ?item ?itemLabel
?hasSubsidiary ?hasSubsidiaryLabel
?isSubsidiaryOf ?isSubsidiaryOfLabel
WHERE {
  ?org_types wdt:P2 wd:Q50862 . # subclasses of USGS organization
  ?item wdt:P1 ?org_types . # items in those classes
  OPTIONAL {
    ?item wdt:P189 ?hasSubsidiary . # has subsidiary relationships
  }
  OPTIONAL {
    ?item wdt:P190 ?isSubsidiaryOf . # is subsidiary of relationships
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
</sparql>

Latest revision as of 15:47, 22 November 2023

USGS Organizations

In the GeoKB, we have organized items representing USGS organizational units in the best way possible based on existing public information on the USGS web. The web sites for USGS list the following major organization types:

  • Mission Areas
  • Programs
  • Regions
  • Science Centers

The USGS web also has listings for two additional types of "sub-organizations" (Laboratories and Observatories) that are not necessarily fully complete. As with many organizations, USGS organizational units are fluid through time and do not always have persistent, resolvable identifiers that stand the test of time. Older references to organization names may no longer be found online in a reliable form. In the GeoKB, we are attempting to provide a platform where this dynamic can be dealt with and recorded in a way that serves as an enduring reference and an indication of where things change.

Classification

The concept of organization aligns with several different standard ontologies, including FOAF and schema.org. In the GeoKB, we place government organization as a specific subclass of organization and then another subclass of USGS organization. From the USGS organization item, we place subclasses that align with the USGS web public presentation of the USGS organizational structure along with some additional interpretation to help reflect the organizations as presented. The following query assembles the entire classification graph used for USGS organizations in the GeoKB:

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
PREFIX wd: <https://geokb.wikibase.cloud/entity/>

SELECT ?item ?itemLabel ?subclass_of ?subclass_ofLabel
WHERE {
  {
    wd:Q50862 wdt:P2* ?item .
    ?item wdt:P2 ?subclass_of .
  } UNION {
    ?item wdt:P2* wd:Q50862 ;
          wdt:P2 ?subclass_of .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


Names and Identifiers

Organization names change through time. Sometimes, these are particularly significant changes that reflect a fundamental change in organizational structure, form, and function. In other cases, these changes reflect something less substantive where alternate names are essentially a simpler change in identifier. In the former case, we attempt to reflect this dynamic with a new entity in the GeoKB and a relationship to a former entity. In the latter case, we simply use alternate labels that allow an item to be discovered or referred to using a former name that may still be in use in some cases.

There is no single persistent identifier system for all USGS organizations. While there are internal codes used in business management systems, these do not serve the purpose of uniquely identifying an organizational unit through time. Some USGS organizational units do have identifiers such as DUNS numbers as granting institutions or Research Organization Registry identifiers.

The following query retrieves items with ROR identifiers, which may prove over time to be a reasonable approach for USGS to follow in persistently identifying its organizational units through an external resolver system.

PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?ror ?ror_url
WHERE {
  ?item wdt:P193 ?ror .
  BIND (CONCAT("https://ror.org/", STR(?ror)) AS ?ror_url)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!


All USGS Organizations

The GeoKB can prove a useful tool in reconciling USGS organization names to other identifiers. Though non-persistent and unstable through time, URLs for organizations could prove useful in some cases and are included in the GeoKB as reference URL claims (P31). The GeoKB can be used as a reconciliation service for something like a spreadsheet of records processed through OpenRefine. The following query will return all USGS organizations, including alternate labels (which includes things like acronyms), in a format that can be suitable for lookup and name resolution purposes. Since some organizations have multiple URLs listed for official website, these are grouped into a list.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel ?item_alt_label ?instance_ofLabel 
(GROUP_CONCAT(?url; separator=",") AS ?urls)
WHERE {
  ?org_types wdt:P2 wd:Q50862 . # Gets subclasses of USGS organization
  ?item wdt:P1 ?org_types ; # Gets items in those classes
        wdt:P1 ?instance_of . # Gets the individual instance of classification
  OPTIONAL {
    ?item skos:altLabel ?item_alt_label . # Retrieves alternate labels into separate rows
    FILTER (lang(?item_alt_label)='en')
  }
  OPTIONAL {
    ?item wdt:P145 ?url . # Get all reference URLs when available
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?item ?itemLabel ?item_alt_label ?instance_ofLabel

Try it!


Organizational Hierarchy

In the GeoKB, we use the inverse properties has subsidiary and is subsidiary of to describe the relationships between organizational entities like those in the USGS that are organized into a hierarchy. These can be used with SPARQL to construct a graph.

PREFIX wd: <https://geokb.wikibase.cloud/entity/>
PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>

SELECT ?item ?itemLabel 
?hasSubsidiary ?hasSubsidiaryLabel
?isSubsidiaryOf ?isSubsidiaryOfLabel
WHERE {
  ?org_types wdt:P2 wd:Q50862 . # subclasses of USGS organization
  ?item wdt:P1 ?org_types . # items in those classes
  OPTIONAL {
    ?item wdt:P189 ?hasSubsidiary . # has subsidiary relationships
  }
  OPTIONAL {
    ?item wdt:P190 ?isSubsidiaryOf . # is subsidiary of relationships
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Try it!