Daily surface temperatures for 185,549 lakes in the conterminous United States estimated using deep learning (1980–2020)
The dataset described here includes estimates of historical (1980–2020) daily surface water temperature, lake metadata, and daily weather conditions for lakes bigger than 4 ha in the conterminous United States (n = 185,549), and also in situ temperature observations for a subset of lakes (n = 12,227). Estimates were generated using a long short-term memory deep learning model and compared to existing process-based and linear regression models. Model training was optimized for prediction on unmonitored lakes through cross-validation that held out lakes to assess generalizability and estimate error. On the held-out lakes with in situ observations, median lake-specific error was 1.24°C, and the overall root mean squared error was 1.61°C. This dataset increases the number of lakes with daily temperature predictions when compared to existing datasets, as well as substantially improves predictive accuracy compared to a prior empirical model and a debiased process-based approach (2.01°C and 1.79°C median error, respectively).