1,477,004
edits
Line 2: | Line 2: | ||
= Caching raw data as schema.org documents = | = Caching raw data as schema.org documents = | ||
The information we are using to build representations of people comes | The information we are using to build representations of people comes several sources. | ||
* USGS Staff Profile pages (via a web scraping routine) | * USGS Staff Profile pages (via a web scraping routine) | ||
* ORCID records | * ORCID records | ||
* OpenAlex records | |||
In the case of USGS Staff Profiles, our primary source for personnel information in this knowledge graph, we have no programmatic or structured data access path and must use a web scraper to pull from pages periodically. In striving toward an ideal we'd like to see in future, we have started organizing all of the scraped content into notional [https://schema.org/Person schema.org/Person] documents. These are cached to the associated "item talk" pages (encoded in YAML) for the person entity and then used from that state to set labels, descriptions, aliases, and claims. | In the case of USGS Staff Profiles, our primary source for personnel information in this knowledge graph, we have no programmatic or structured data access path and must use a web scraper to pull from pages periodically. In striving toward an ideal we'd like to see in future, we have started organizing all of the scraped content into notional [https://schema.org/Person schema.org/Person] documents. These are cached to the associated "item talk" pages (encoded in YAML) for the person entity and then used from that state to set labels, descriptions, aliases, and claims. |