|
|
Line 1: |
Line 1: |
| <span id="reconciling-usmin-dataset-to-createupdate-entries-with-openrefine"></span>
| | { |
| = Reconciling USMIN Dataset to Create/Update Entries With OpenRefine =
| | "DOI": { |
| | | "_id": "10.5066/f78w3chg", |
| <span id="purpose"></span>
| | "attributes": { |
| == Purpose ==
| | "doi": "10.5066/f78w3chg", |
| | | "identifiers": [], |
| Given that there are datasets related to USMIN within the sciencebase [https://www.sciencebase.gov/catalog/item/5a1492c3e4b09fc93dcfd574 website], we want to transfer related mine information into the GeoScience KnowledgeBase to add more related information into the knowledge graph already availailable. By doing so, new information and insights can be gathered and processed as desired. We will be using a tool called [https://openrefine.org/ OpenRefine], a GUI meant for reconciliation with many different wikibase instances as well as upserting any relevant data to the desired instance.
| | "creators": [ |
| | | { |
| <span id="prerequisites"></span>
| | "name": "Horton, John D", |
| | | "nameType": "Personal", |
| == Prerequisites ==
| | "givenName": "John D", |
| | | "familyName": "Horton", |
| This discussion is assuming you have the following installed
| | "affiliation": [], |
| | | "nameIdentifiers": [ |
| * Docker
| | { |
| | | "schemeUri": "https://orcid.org", |
| <span id="retrieving-the-usmin-dataset"></span>
| | "nameIdentifier": "https://orcid.org/0000-0003-2969-9073", |
| | | "nameIdentifierScheme": "ORCID" |
| == Retrieving the USMIN dataset ==
| | } |
| | | ] |
| For this example, we will be using the <code>USGS_TopoMineSymbols_ver9_Geodatabase.zip</code> downloadable file from the sciencebase item above. Once unzipped we can use the <code>USGS_TopoMineSymbols_ver9.gdb</code> folder.
| | }, |
| | | { |
| <blockquote>As of this writing version 10 of the dataset has been released, with <code>USGS_TopoMineSymbols_ver10_Geodatabase.zip</code> being the replacement of the example that was used. We will have to determine whether we need to update the version depending on if there is more relevant data that we need to insert into the geoKB.
| | "name": "San Juan, Carma A", |
| </blockquote>
| | "nameType": "Personal", |
| The desired format for processing data using OpenRefine is either a <code>csv</code> or <code>xlsx</code> file, so because of that we will need to convert this <code>gdb</code> into <code>csv</code> format. We can do this using the OpenGIS Simple Features Reference Implementation (OGR), specifically <code>ogr2ogr</code> command found in the Geospatial Data Abstraction Library (GDAL). One feature available within is the <code>ogrinfo</code> command, which we can use to find out the metadata of the gdb file that was just downloaded. This is helpful to get an understanding of the feature layers, fields and property types.
| | "givenName": "Carma A", |
| | | "familyName": "San Juan", |
| Example command used for <code>topominesymbols (24k)</code> using ogrinfo:
| | "affiliation": [], |
| | | "nameIdentifiers": [ |
| <syntaxhighlight lang="bash">docker run --rm -v $(pwd):/home ghcr.io/osgeo/gdal:alpine-normal-latest ogrinfo -ro -so -al /home/USGS_TopoMineSymbols_ver9.gdb</syntaxhighlight>
| | { |
| where <code>-ro</code> opens the data source in read-only mode, <code>-so</code> outputs the info in summary mode, and <code>-al</code> lists all the layers found within. More options can be found int the [https://gdal.org/programs/ogrinfo.html#ogrinfo GDAL documentation]. Because this is just for demonstration purposes the layer that will be used is <code>USGS_TopoMineSymbols_24k_Points</code>, although there are larger layers with much more information that can be used for processing.
| | "schemeUri": "https://orcid.org", |
| | | "nameIdentifier": "https://orcid.org/0000-0002-9151-1919", |
| <blockquote>The command above will pull the GDAL docker image automatically if the latest version is not found locally.
| | "nameIdentifierScheme": "ORCID" |
| </blockquote>
| | } |
| Now that we know which layer needs to be converted, we can used <code>ogr2ogr</code> to convert from gdb to csv. This can be done using the following command in the terminal.
| | ] |
| | | } |
| <syntaxhighlight lang="bash"> docker run --rm -v $(pwd):/home ghcr.io/osgeo/gdal:alpine-normal-latest ogr2ogr -f CSV -lco GEOMETRY=AS_XY /home/USGS_TopoMineSymbols_24k_Points.csv /home/USGS_TopoMineSymbols_ver9.gdb USGS_TopoMineSymbols_24k_Points</syntaxhighlight>
| | ], |
| <blockquote>Notice that this command specifies that we want the lat and long coordinates (<code>-lco</code>), which is a crucial step since without them it will make the process much harder due to the nature of the outputted csv data.
| | "titles": [ |
| </blockquote>
| | { |
| Now that the csv file is ready, we can open the site https://openrefine.demo5280.com/ and create a new project. Once that is done we can start the reconciliation and preprocessing needed to upsert any new and relevant information.
| | "title": "Prospect- and Mine-Related Features from U.S. Geological Survey 7.5- and 15-Minute Topographic Quadrangle Maps of the United States (ver. 10.0, May 2023)" |
| | | } |
| After evaluating the data we wanted to accomplish these goals: - For each record - find an existing mine within 10km of the provided location in the CSV with a known location - associate the location to the existing mine - associate additional feature properties - associating wikidata <code>same as</code> items for feature properties (ie. <code>Ftr_type</code>) - reference USMIN as the Knowledge Source to these claims
| | ], |
| | | "publisher": "U.S. Geological Survey", |
| <span id="matching-data-to-closest-mines-by-location"></span>
| | "container": {}, |
| == Matching Data to Closest Mines by Location ==
| | "publicationYear": 2016, |
| | | "subjects": [ |
| First thing of note is that the <code>ftr_name</code> field does not have much info on what the mines are called or associated to. To solve this issue, we need to find the closest match to a mine that already exists within the GeoKB. Specifically, we use SPARQL and the built-in wikibase functionalities to determine the best candidates to match the data to. In OpenRefine, select the dropdown arrow of a column (doesn’t really matter which one), then select <code>Edit column</code> → <code>Add column by fetching URLs</code>. This will give a pop up window where you can enter a command or script to make a URL request. Because Python is more well known than the standard selection of General Refine Expression Language(GREL), we first need to switch the language to <code>Python/Jython</code>.
| | { |
| | | "subject": "Geospatial, GIS" |
| <blockquote>Note: There are some standard libraries already added in to the system so we can use the following script without any additional work to create a SPARQL request to the https://geokb.wikibase.cloud/query/sparql endpoint.
| | } |
| </blockquote>
| | ], |
| Python to paste into the textarea to retrieve the item closest to specified coordinate:
| | "contributors": [], |
| | | "dates": [ |
| <syntaxhighlight lang="python">from urllib import quote_plus
| | { |
| x=cells['X'].value
| | "date": "2019-11-25", |
| y=cells['Y'].value
| | "dateType": "Updated" |
| query='''
| | }, |
| PREFIX wd: <https://geokb.wikibase.cloud/entity/>
| | { |
| PREFIX wdt: <https://geokb.wikibase.cloud/prop/direct/>
| | "date": "2020-07-29", |
| | | "dateType": "Updated" |
| SELECT ?item ?itemLabel ?location ?distance
| | }, |
| WHERE {
| | { |
| ?item wdt:P1 wd:Q3646 .
| | "date": "2021-04-12", |
| SERVICE wikibase:around {
| | "dateType": "Updated" |
| ?item wdt:P6 ?location .
| | }, |
| bd:serviceParam wikibase:center "POINT('''+x+''' '''+y+''')"^^geo:wktLiteral .
| | { |
| bd:serviceParam wikibase:radius "10" .
| | "date": "2022-04-19", |
| bd:serviceParam wikibase:distance ?distance.
| | "dateType": "Updated" |
| }
| | }, |
| SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
| | { |
| } | | "date": "2022-09-29", |
| ORDER BY ASC(?distance)
| | "dateType": "Updated" |
| LIMIT 1
| | }, |
| '''
| | { |
| encoded_query=quote_plus(query)
| | "date": "2023-01-30", |
| return 'https://geokb.wikibase.cloud/query/sparql?query='+encoded_query+'&format=json'</syntaxhighlight>
| | "dateType": "Updated" |
| This will output the following or something similar for each entry of the csv.
| | }, |
| | | { |
| <syntaxhighlight lang="json">{
| | "date": "2023-05-22", |
| "head" : {
| | "dateType": "Updated" |
| "vars" : [ "item", "itemLabel", "location", "distance" ]
| | }, |
| },
| | { |
| "results" : {
| | "date": "2016", |
| "bindings" : [ {
| | "dateType": "Issued" |
| "item" : {
| | } |
| "type" : "uri", | | ], |
| "value" : "https://geokb.wikibase.cloud/entity/Q5640" | | "language": null, |
| | "types": { |
| | "ris": "DATA", |
| | "bibtex": "misc", |
| | "citeproc": "dataset", |
| | "schemaOrg": "Dataset", |
| | "resourceType": "Dataset", |
| | "resourceTypeGeneral": "Dataset" |
| }, | | }, |
| "location" : { | | "relatedIdentifiers": [ |
| "datatype" : "http://www.opengis.net/ont/geosparql#wktLiteral",
| | { |
| "type" : "literal", | | "relationType": "IsCitedBy", |
| "value" : "Point(-87.1258288 33.1237301)" | | "relatedIdentifier": "10.1016/j.gexplo.2020.106712", |
| }, | | "relatedIdentifierType": "DOI" |
| "distance" : { | | }, |
| "datatype" : "http://www.w3.org/2001/XMLSchema#double",
| | { |
| "type" : "literal",
| | "relationType": "IsCitedBy", |
| "value" : "2.739"
| | "relatedIdentifier": "10.3133/ofr20181139", |
| }, | | "relatedIdentifierType": "DOI" |
| "itemLabel" : { | | } |
| "xml:lang" : "en",
| | ], |
| "type" : "literal", | | "relatedItems": [], |
| "value" : "Hill Creek Mine (138378)"
| | "sizes": [], |
| | "formats": [], |
| | "version": null, |
| | "rightsList": [], |
| | "descriptions": [ |
| | { |
| | "description": "Version 10.0 of these data are part of a larger U.S. Geological Survey (USGS) project to develop an updated geospatial database of mines, mineral deposits, and mineral regions in the United States. Mine and prospect-related symbols, such as those used to represent prospect pits, mines, adits, dumps, tailings, etc., hereafter referred to as 'mine' symbols or features, have been digitized from the 7.5-minute (1:24,000, 1:25,000-scale; and 1:10,000, 1:20,000 and 1:30,000-scale in Puerto Rico only) and the 15-minute (1:48,000 and 1:62,500-scale; 1:63,360-scale in Alaska only) archive of the USGS Historical Topographic Map Collection (HTMC), or acquired from available databases (California and Nevada, 1:24,000-scale only). Compilation of these features is the first phase in capturing accurate locations and general information about features related to mineral resource exploration and extraction across the U.S. The compilation of 725,690 point and polygon mine symbols from approximately 106,350 maps across 50 states, the Commonwealth of Puerto Rico (PR) and the District of Columbia (DC) has been completed: Alabama (AL), Alaska (AK), Arizona (AZ), Arkansas (AR), California (CA), Colorado (CO), Connecticut (CT), Delaware (DE), Florida (FL), Georgia (GA), Hawaii (HI), Idaho (ID), Illinois (IL), Indiana (IN), Iowa (IA), Kansas (KS), Kentucky (KY), Louisiana (LA), Maine (ME), Maryland (MD), Massachusetts (MA), Michigan (MI), Minnesota (MN), Mississippi (MS), Missouri (MO), Montana (MT), Nebraska (NE), Nevada (NV), New Hampshire (NH), New Jersey (NJ), New Mexico (NM), New York (NY), North Carolina (NC), North Dakota (ND), Ohio (OH), Oklahoma (OK), Oregon (OR), Pennsylvania (PA), Rhode Island (RI), South Carolina (SC), South Dakota (SD), Tennessee (TN), Texas (TX), Utah (UT), Vermont (VT), Virginia (VA), Washington (WA), West Virginia (WV), Wisconsin (WI), and Wyoming (WY). The process renders not only a more complete picture of exploration and mining in the U.S., but an approximate timeline of when these activities occurred. These data may be used for land use planning, assessing abandoned mine lands and mine-related environmental impacts, assessing the value of mineral resources from Federal, State and private lands, and mapping mineralized areas and systems for input into the land management process. These data are presented as three groups of layers based on the scale of the source maps. No reconciliation between the data groups was done.", |
| | "descriptionType": "Abstract" |
| | } |
| | ], |
| | "geoLocations": [], |
| | "fundingReferences": [], |
| | "url": "https://www.sciencebase.gov/catalog/item/5a1492c3e4b09fc93dcfd574", |
| | "contentUrl": [], |
| | "metadataVersion": 19, |
| | "schemaVersion": "http://datacite.org/schema/kernel-4", |
| | "source": "mds", |
| | "isActive": true, |
| | "state": "findable", |
| | "reason": null, |
| | "viewCount": 0, |
| | "downloadCount": 0, |
| | "referenceCount": 1, |
| | "citationCount": 2, |
| | "partCount": 0, |
| | "partOfCount": 0, |
| | "versionCount": 0, |
| | "versionOfCount": 0, |
| | "created": "2017-12-01T21:04:01Z", |
| | "registered": "2017-12-01T21:04:02Z", |
| | "published": null, |
| | "updated": "2023-08-27T08:55:20Z" |
| | }, |
| | "relationships": { |
| | "client": { |
| | "data": { |
| | "id": "usgs.prod", |
| | "type": "clients" |
| | } |
| } | | } |
| } ] | | }, |
| | "type": "dois" |
| } | | } |
| }</syntaxhighlight>
| |
| To parse the response given into new columns, we can use the following Python script to read the json and extract the relevant data. Select <code>Edit column</code> → <code>Add column based on this column</code> found in the dropdown menu for the <code>closest_item_response</code> field that just created and add the following Python script in the textarea to create <code>closest_item_label</code> column:
| |
|
| |
| <syntaxhighlight lang="python">import json
| |
| val_dict = json.loads(value)
| |
| res = val_dict['results']['bindings'][0]['itemLabel']['value']
| |
| return res</syntaxhighlight>
| |
| Similarly, we can create the <code>item</code>, <code>location</code>, and <code>distance</code> columns by replacing the <code>itemLabel</code> variable found in the script above.
| |
|
| |
| <span id="additional-preprocessing"></span>
| |
| == Additional Preprocessing ==
| |
|
| |
| One thing of note is that the data only has the abbreviation of the states where the mines are located. This is not as useful as we need it to be due to the fact that the items found in the GeoKB are listed as their respective full names. Fortunately there is an easy and efficient fix to this. We can create a new field <code>state_name_full</code> that returns the full name using Python and a dictionary (we can use the same method to <code>Add the columns based on other columns</code> done previously).
| |
|
| |
| <syntaxhighlight lang="python">state_name = { 'AL': 'Alabama',
| |
| 'AK': 'Alaska',
| |
| 'AZ': 'Arizona',
| |
| 'AR': 'Arkansas',
| |
| 'CA': 'California',
| |
| 'CO': 'Colorado',
| |
| 'CT': 'Connecticut',
| |
| 'DE': 'Delaware',
| |
| 'FL': 'Florida',
| |
| 'GA': 'Georgia',
| |
| 'HI': 'Hawaii',
| |
| 'ID': 'Idaho',
| |
| 'IL': 'Illinois',
| |
| 'IN': 'Indiana',
| |
| 'IA': 'Iowa',
| |
| 'KS': 'Kansas',
| |
| 'KY': 'Kentucky',
| |
| 'LA': 'Louisiana',
| |
| 'ME': 'Maine',
| |
| 'MD': 'Maryland',
| |
| 'MA': 'Massachusetts',
| |
| 'MI': 'Michigan',
| |
| 'MN': 'Minnesota',
| |
| 'MS': 'Mississippi',
| |
| 'MO': 'Missouri',
| |
| 'MT': 'Montana',
| |
| 'NE': 'Nebraska',
| |
| 'NV': 'Nevada',
| |
| 'NH': 'New Hampshire',
| |
| 'NJ': 'New Jersey',
| |
| 'NM': 'New Mexico',
| |
| 'NY': 'New York',
| |
| 'NC': 'North Carolina',
| |
| 'ND': 'North Dakota',
| |
| 'OH': 'Ohio',
| |
| 'OK': 'Oklahoma',
| |
| 'OR': 'Oregon',
| |
| 'PA': 'Pennsylvania',
| |
| 'RI': 'Rhode Island',
| |
| 'SC': 'South Carolina',
| |
| 'SD': 'South Dakota',
| |
| 'TN': 'Tennessee',
| |
| 'TX': 'Texas',
| |
| 'UT': 'Utah',
| |
| 'VT': 'Vermont',
| |
| 'VA': 'Virginia',
| |
| 'WA': 'Washington',
| |
| 'WV': 'West Virginia',
| |
| 'WI': 'Wisconsin',
| |
| 'WY': 'Wyoming',
| |
| 'DC': 'District of Columbia',
| |
| 'AS': 'American Samoa',
| |
| 'GU': 'Guam',
| |
| 'MP': 'Northern Mariana Islands',
| |
| 'PR': 'Puerto Rico',
| |
| 'UM': 'United States Minor Outlying Islands',
| |
| 'VI': 'U.S. Virgin Islands'
| |
| } | | } |
| return state_name[cells['State'].value]</syntaxhighlight>
| |
| According to the OpenRefine [https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Schema_alignment#Globe_coordinates documentation] for schema formatting, we will need to parse the returned WKT of the <code>closest_item_loc</code> column that was generated. There’s two ways to solve this: - parsing the coordinate value when it is retrieved from the response - creating a new column with the coordinates specified in the USMIN dataset
| |
|
| |
| To do the former, we can use the following script.
| |
|
| |
| <syntaxhighlight lang="python">import json
| |
| import re
| |
| val_dict = json.loads(cells['closest_item_response'].value)
| |
| res = val_dict['results']['bindings'][0]['location']['value']
| |
| lon,lat = re.findall('-?\d+\.\d+',res)
| |
| return lat+','+lon</syntaxhighlight>
| |
| If we just want to create a new column from the <code>X</code> and <code>Y</code> columns just go to <code>Edit column</code> → <code>Add columns based on this column</code> and add this Python script in the textarea.
| |
|
| |
| <syntaxhighlight lang="python">return cells['Y'].value + ',' + cells['X'].value</syntaxhighlight>
| |
|
| |
|
| |
| <span id="reconcile-the-data"></span>
| |
|
| |
| == Reconcile the Data ==
| |
|
| |
| Now that we have the parsed information on the closest mine and added it alongside each data entry of the csv, we can reconcile the relevant information back to the GeoKB. For example, to reconcile the state name to the corresponding item found in the GeoKB, we select <code>Reconcile</code> → <code>Start reconciling</code>. Under Services, Select <code>Reconcile for GeoKB (en)</code> and ensure that <code>U.S. State (Q229)</code> bullet is checked, then click on the <code>Start reconciling</code> button found at the bottom right of the pop-up window. This will find the corresponding item and match them to each entry in the field. Once it is finished processing, double check to make sure that the corresponding items are matched correctly. If needed, select <code>Search for match</code> under the reconciled data and select the best fit. This will change the other similar rows in the dataset automatically unless the <code>Match this cell only</code> option is selected.
| |
|
| |
| Other columns have been reconciled this way e.g. <code>closest_item_label</code> column that was created from the URL request against the <code>mine(Q3646)</code> item.
| |
|
| |
| The benefit of reconciling in OpenRefine is that different columns can be reconciled using different Wiki services. This was actually done with <code>Ftr_type</code> to retrieve the QId values found in Wikidata. To do this we select the dropdown for the reconciled <code>Ftr_type</code> column then selecting <code>Edit column</code> → <code>Add columns from reconciled values</code> and clicking on <code>Qid</code> found in the <code>Suggested properties</code> textarea.
| |
|
| |
| If needed, the QId was returned in the URL request within the <code>item</code> value as a URI. Using regular expressions we can extract the value within using the following Python code to create <code>closest_QID</code> column.
| |
|
| |
| <syntaxhighlight lang="python">import re
| |
| return re.search('Q\d+', str(value)).group()</syntaxhighlight>
| |
| <span id="building-the-schema"></span>
| |
| == Building the schema ==
| |
|
| |
| Now that the data has been processed and reconciled, we can build a schema before submitting the changes to the GeoKB. This can be done using the <code>schema</code> section of OpenRefine, which is located on the top middle of the GUI (If it doesn’t show for some reason, another way to get there is to select <code>Wikibase</code> → <code>Edit Wikibase schema</code> found under the Wikibase extension on the top right of the GUI). Here is the schema that was exported that can be imported back in using <code>Wikibase</code> → <code>Manage schemas</code> → <code>Choose file</code>.
| |
|
| |
| <syntaxhighlight lang="json">{
| |
| "name": "USMIN schema",
| |
| "schema": {
| |
| "entityEdits": [
| |
| {
| |
| "type": "wbitemeditexpr",
| |
| "subject": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "closest_item_label"
| |
| },
| |
| "statementGroups": [
| |
| {
| |
| "property": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P11",
| |
| "label": "located in the administrative territorial entity",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "statements": [
| |
| {
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "state_name_full"
| |
| },
| |
| "qualifiers": [],
| |
| "references": [
| |
| {
| |
| "snaks": [
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P70",
| |
| "label": "knowledge source",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "kb_source"
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "mode": "add_or_merge",
| |
| "mergingStrategy": {
| |
| "type": "snak",
| |
| "valueMatcher": {
| |
| "type": "lax"
| |
| }
| |
| }
| |
| },
| |
| {
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "County"
| |
| },
| |
| "qualifiers": [],
| |
| "references": [
| |
| {
| |
| "snaks": [
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P70",
| |
| "label": "knowledge source",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "kb_source"
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "mode": "add_or_merge",
| |
| "mergingStrategy": {
| |
| "type": "snak",
| |
| "valueMatcher": {
| |
| "type": "lax"
| |
| }
| |
| }
| |
| }
| |
| ]
| |
| },
| |
| {
| |
| "property": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P6",
| |
| "label": "coordinate location",
| |
| "datatype": "globe-coordinate"
| |
| },
| |
| "statements": [
| |
| {
| |
| "value": {
| |
| "type": "wblocationvariable",
| |
| "columnName": "lat_lon_coords"
| |
| },
| |
| "qualifiers": [
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P7",
| |
| "label": "publication date",
| |
| "datatype": "time"
| |
| },
| |
| "value": {
| |
| "type": "wbdatevariable",
| |
| "columnName": "Topo_Date"
| |
| }
| |
| },
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P120",
| |
| "label": "Remarks",
| |
| "datatype": "string"
| |
| },
| |
| "value": {
| |
| "type": "wbstringvariable",
| |
| "columnName": "Remarks"
| |
| }
| |
| }
| |
| ],
| |
| "references": [
| |
| {
| |
| "snaks": [
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P70",
| |
| "label": "knowledge source",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "kb_source"
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "mode": "add_or_merge",
| |
| "mergingStrategy": {
| |
| "type": "snak",
| |
| "valueMatcher": {
| |
| "type": "lax"
| |
| }
| |
| }
| |
| }
| |
| ]
| |
| },
| |
| {
| |
| "property": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P118",
| |
| "label": "GDA ID",
| |
| "datatype": "external-id"
| |
| },
| |
| "statements": [
| |
| {
| |
| "value": {
| |
| "type": "wbstringvariable",
| |
| "columnName": "GDA_ID"
| |
| },
| |
| "qualifiers": [],
| |
| "references": [
| |
| {
| |
| "snaks": [
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P70",
| |
| "label": "knowledge source",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "kb_source"
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "mode": "add_or_merge",
| |
| "mergingStrategy": {
| |
| "type": "snak",
| |
| "valueMatcher": {
| |
| "type": "lax"
| |
| }
| |
| }
| |
| }
| |
| ]
| |
| },
| |
| {
| |
| "property": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P119",
| |
| "label": "Scan ID",
| |
| "datatype": "external-id"
| |
| },
| |
| "statements": [
| |
| {
| |
| "value": {
| |
| "type": "wbstringvariable",
| |
| "columnName": "ScanID"
| |
| },
| |
| "qualifiers": [],
| |
| "references": [
| |
| {
| |
| "snaks": [
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P70",
| |
| "label": "knowledge source",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "kb_source"
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "mode": "add_or_merge",
| |
| "mergingStrategy": {
| |
| "type": "snak",
| |
| "valueMatcher": {
| |
| "type": "lax"
| |
| }
| |
| }
| |
| }
| |
| ]
| |
| },
| |
| {
| |
| "property": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P1",
| |
| "label": "instance of",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "statements": [
| |
| {
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "Ftr_Type"
| |
| },
| |
| "qualifiers": [],
| |
| "references": [
| |
| {
| |
| "snaks": [
| |
| {
| |
| "prop": {
| |
| "type": "wbpropconstant",
| |
| "pid": "P70",
| |
| "label": "knowledge source",
| |
| "datatype": "wikibase-item"
| |
| },
| |
| "value": {
| |
| "type": "wbentityvariable",
| |
| "columnName": "kb_source"
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "mode": "add_or_merge",
| |
| "mergingStrategy": {
| |
| "type": "snak",
| |
| "valueMatcher": {
| |
| "type": "lax"
| |
| }
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "nameDescs": [
| |
| {
| |
| "type": "wbnamedescexpr",
| |
| "name_type": "LABEL_IF_NEW",
| |
| "value": {
| |
| "type": "wbmonolingualexpr",
| |
| "language": {
| |
| "type": "wblanguageconstant",
| |
| "id": "en",
| |
| "label": "en"
| |
| },
| |
| "value": {
| |
| "type": "wbstringvariable",
| |
| "columnName": "closest_item_label"
| |
| }
| |
| }
| |
| }
| |
| ]
| |
| }
| |
| ],
| |
| "siteIri": "https://geokb.wikibase.cloud/entity/",
| |
| "entityTypeSiteIRI": {
| |
| "item": "https://geokb.wikibase.cloud/entity/",
| |
| "property": "https://geokb.wikibase.cloud/entity/",
| |
| "mediainfo": "https://geokb.wikibase.cloud/entity/"
| |
| },
| |
| "mediaWikiApiEndpoint": "https://geokb.wikibase.cloud/w/api.php"
| |
| }
| |
| }</syntaxhighlight>
| |
| Now that the schema is set the way we want, select <code>Wikibase</code> → <code>Upload edits to Wikibase</code>. Log in if prompted. If there are warnings that pop up, then this means some adjustments to the values being inserted need to be modified to follow the correct format. Once all the warnings and errors are taken care of, add a summary in the textbox of the upsert being made, then click on <code>Upload edits</code> to add the data into the GeoKB.
| |
|
| |
| Some final steps need to be taken to ensure the schema is saved and you have logged out of your account. To save a schema, click on the <code>Save new</code> button found in the schema section of the GUI. From there you can either save as a new schema by providing a name in the text box or we can overwrite an existing schema that was previously saved.
| |
|
| |
| <blockquote>Warning: Simply clicking the <code>Save schema</code> button on the top right does not save your changes that have been made unless you loaded a previously saved schema. Any schema created from scratch will not be saved if overwritten.
| |
| </blockquote>
| |
| To log out, go to <code>Wikibase</code> found in the extensions section on the top right, and select <code>Manage Wikibase account</code>. From there you can click the <code>Log out</code> on the pop up window.
| |