Item talk:Q146485
Beware of spatial autocorrelation when applying machine learning algorithms to borehole geophysical logs
Although many of the algorithms now considered to be machine learning algorithms (MLAs) have existed for nearly a century (e.g., Rosenblatt 1958), interest in MLAs has recently increased exponentially for solving data-driven problems across a variety of fields due to the expanded availability of large, complex datasets that may be difficult to interrogate using other methods, increases in computing power, and a growing library of easily implemented machine learning tools. While MLAs are often similar to statistical methods, there are key differences in the approach to problem solving. Namely, statistical methods are more concerned with generating informative models from “long” data (i.e., many more observations than explanatory variables), whereas MLAs are typically concerned with generating accurate predictions from “wide” data (i.e., a large number of variables with relatively fewer observations, Bzdok et al. 2018). In hydrogeologic studies, such wide datasets may be available from boreholes, where various types of geophysical, geochemical, and lithological information may exist. Borehole datasets are therefore a tempting target for MLAs to reveal hidden relations among gathered data and parameters of interest (e.g., contaminant concentration), and as a method of parameter reduction (e.g., reduce costs by collecting fewer datasets).