Machine Learning Predicts Reach-Scale Channel Types From Coarse-Scale Geospatial Data in a Large River Basin

Bagnold R. A.; Borut S.; Breiman L.; Brierley G. J.; Gesch D.; Gibbs J. W.; Haan C. T.; Ho T. K.; Homer C.; Lane B. A.; Platt J.; Renard K. G.

Machine Learning Predicts Reach-Scale Channel Types From Coarse-Scale Geospatial Data in a Large River Basin

Authors: Bagnold R. A.
Borut S.
Breiman L.
Brierley G. J.
Gesch D.
Gibbs J. W.
Haan C. T.
Ho T. K.
Homer C.
Lane B. A.
Platt J.
Renard K. G.
Publication date: 27 February 2020
Publisher: Hosted by Utah State University Libraries
Doi

Abstract

Hydrologic and geomorphic classifications have gained traction in response to the increasing need for basin-wide water resources management. Regardless of the selected classification scheme, an open scientific challenge is how to extend information from limited field sites to classify tens of thousands to millions of channel reaches across a basin. To address this spatial scaling challenge, this study leverages machine learning to predict reach-scale geomorphic channel types using publicly available geospatial data. A bottom-up machine learning approach selects the most accurate and stable model among∼20,000 combinations of 287 coarse geospatial predictors, preprocessing methods, and algorithms in a three-tiered framework to (i) define a tractable problem and reduce predictor noise, (ii) assess model performance in statistical learning, and (iii) assess model performance in prediction. This study also addresses key issues related to the design, interpretation, and diagnosis of machine learning models in hydrologic sciences. In an application to the Sacramento River basin (California, USA), the developed framework selects a Random Forest model to predict 10 channel types previously determined from 290 field surveys over 108,943 two hundred-meter reaches. Performance in statistical learning is reasonable with a 61% median cross-validation accuracy, a sixfold increase over the 10% accuracy of the baseline random model, and the predictions coherently capture the large-scale geomorphic organization of the landscape. Interestingly, in the study area, the persistent roughness of the topography partially controls channel types and the variation in the entropy-based predictive performance is explained by imperfect training information and scale mismatch between labels and predictors