Item talk:Q3: Difference between revisions

Line 2: Line 2:


= Caching raw data as schema.org documents =
= Caching raw data as schema.org documents =
The information we are using to build representations of people comes from a couple of different online sources.
The information we are using to build representations of people comes several sources.
* USGS Staff Profile pages (via a web scraping routine)
* USGS Staff Profile pages (via a web scraping routine)
* ORCID records
* ORCID records
* OpenAlex records


In the case of USGS Staff Profiles, our primary source for personnel information in this knowledge graph, we have no programmatic or structured data access path and must use a web scraper to pull from pages periodically. In striving toward an ideal we'd like to see in future, we have started organizing all of the scraped content into notional [https://schema.org/Person schema.org/Person] documents. These are cached to the associated "item talk" pages (encoded in YAML) for the person entity and then used from that state to set labels, descriptions, aliases, and claims.
In the case of USGS Staff Profiles, our primary source for personnel information in this knowledge graph, we have no programmatic or structured data access path and must use a web scraper to pull from pages periodically. In striving toward an ideal we'd like to see in future, we have started organizing all of the scraped content into notional [https://schema.org/Person schema.org/Person] documents. These are cached to the associated "item talk" pages (encoded in YAML) for the person entity and then used from that state to set labels, descriptions, aliases, and claims.