68 research outputs found
SAHRA Integrated Modeling Approach Towards Basin-Scale Water Resources Management
Water resources decisions in the 21st Century will have strong economic and environmental components and can therefore benefit from scenario analyses that make use of integrated river basin models. SAHRA (the National Science Foundation Science and Technology Center for Sustainability of semi-Arid Hydrology and Riparian Areas) is developing an integrated modeling framework based on four hierarchical levels â a physical systems model (including surface, subsurface and atmospheric components where appropriate), an engineering systems model (including agriculture, reservoirs, etc.), a human systems behavioral model (socio-economic components) and an institutional systems model (laws, compacts etc.). This integrated framework is rooted in a perceptual-conceptual systems model of the river basin and a database support structure. This paper describes the SAHRA approach to linking the various hierarchical levels and discusses how it is being applied to answer the question, under what conditions are water markets and water banking feasible? Integration of the four hierarchical levels will allow water resource managers to consider the trading of water rights and third party impacts in evaluating the potential for market-based mechanisms to allocate water resources effectively
A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems
Although decades of effort have been devoted to building Physical-Conceptual
(PC) models for predicting the time-series evolution of geoscientific systems,
recent work shows that Machine Learning (ML) based Gated Recurrent Neural
Network technology can be used to develop models that are much more accurate.
However, the difficulty of extracting physical understanding from ML-based
models complicates their utility for enhancing scientific knowledge regarding
system structure and function. Here, we propose a physically-interpretable Mass
Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and
ML-based modeling approaches. The MCP exploits the inherent isomorphism between
the directed graph structures underlying both PC models and GRNNs to explicitly
represent the mass-conserving nature of physical processes while enabling the
functional nature of such processes to be directly learned (in an interpretable
manner) from available data using off-the-shelf ML technology. As a proof of
concept, we investigate the functional expressivity (capacity) of the MCP,
explore its ability to parsimoniously represent the rainfall-runoff (RR)
dynamics of the Leaf River Basin, and demonstrate its utility for scientific
hypothesis testing. To conclude, we discuss extensions of the concept to enable
ML-based physical-conceptual representation of the coupled nature of
mass-energy-information flows through geoscientific systems.Comment: 60 pages and 7 figures in the main text. 10 figures, and 10 tables in
the supplementary material
Do Nash values have value?
How Do We Communicate Model Performance? The process of model performance evaluation is of primary importance, not only in the model development and calibration process, but also when communicating the results to other researchers and to stakeholders. The basic âruleâ is that every modelling result should be put into context, for example, by indicating the model performance using appropriate indicators, and by highlighting potential sources of uncertainty, and this practice has found its entry into the large majority of papers and conference presentations. While the question of how to communicate the performance of a model to potential end-users is currently receiving increasing interest (e.g. Pappenberger and Beven, 2006), weâas well as many other colleaguesâobserve regularly that researchers take much less care when communicating model performance amongst ourselves. We seem to assume that we are speaking about familiar performance concepts and that they have comparable significance for various types of model applications and case studies. In doing so, we do not pay sufficient attention to making clear what the values represented by our performance measures really mean. Even concepts as simple as the bias between an observed and a simulated time series need to be put into proper context: whereas a 10% bias in simulation of simulated discharge may be unacceptable in a climate change impact assessment, it may be of less concern in the context of real-time flood forecasting. While some performance measures can have an absolute meaning, such as the common measure of linear correlation, the vast majority of performance measures, and in particular quadratic-error-based measures, can only be properly interpreted when viewed in the context of a reference value (..
On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data
Using information-theoretic quantities in practical applications with continuous data is
often hindered by the fact that probability density functions need to be estimated in higher dimensions,
which can become unreliable or even computationally unfeasible. To make these useful quantities
more accessible, alternative approaches such as binned frequencies using histograms and k-nearest
neighbors (k-NN) have been proposed. However, a systematic comparison of the applicability of these
methods has been lacking. We wish to fill this gap by comparing kernel-density-based estimation
(KDE) with these two alternatives in carefully designed synthetic test cases. Specifically, we wish
to estimate the information-theoretic quantities: entropy, KullbackâLeibler divergence, and mutual
information, from sample data. As a reference, the results are compared to closed-form solutions or
numerical integrals. We generate samples from distributions of various shapes in dimensions ranging
from one to ten. We evaluate the estimatorsâ performance as a function of sample size, distribution
characteristics, and chosen hyperparameters. We further compare the required computation time and
specific implementation challenges. Notably, k-NN estimation tends to outperform other methods,
considering algorithmic implementation, computational efficiency, and estimation accuracy, especially
with sufficient data. This study provides valuable insights into the strengths and limitations of the
different estimation methods for information-theoretic quantities. It also highlights the significance of
considering the characteristics of the data, as well as the targeted information-theoretic quantity when
selecting an appropriate estimation technique. These findings will assist scientists and practitioners
in choosing the most suitable method, considering their specific application and available data. We
have collected the compared estimation methods in a ready-to-use open-source Python 3 toolbox and,
thereby, hope to promote the use of information-theoretic quantities by researchers and practitioners
to evaluate the information in data and models in various disciplines
Model Calibration in Watershed Hydrology
Hydrologic models use relatively simple mathematical equations to conceptualize and aggregate the complex, spatially distributed, and highly interrelated water, energy, and vegetation processes in a watershed. A consequence of process aggregation is that the model parameters often do not represent directly measurable entities and must, therefore, be estimated using measurements of the system inputs and outputs. During this process, known as model calibration, the parameters are adjusted so that the behavior of the model approximates, as closely and consistently as possible, the observed response of the hydrologic system over some historical period of time. This Chapter reviews the current state-of-the-art of model calibration in watershed hydrology with special emphasis on our own contributions in the last few decades. We discuss the historical background that has led to current perspectives, and review different approaches for manual and automatic single- and multi-objective parameter estimation. In particular, we highlight the recent developments in the calibration of distributed hydrologic models using parameter dimensionality reduction sampling, parameter regularization and parallel computing
Decomposition of the Mean Squared Error and NSE Performance Criteria: Implications for Improving Hydrological Modelling
The mean squared error (MSE) and the related normalization, the Nash-Sutcliffe efficiency (NSE), are the two criteria most widely used for calibration and evaluation of hydrological models with observed data. Here, we present a diagnostically interesting decomposition of NSE (and hence MSE), which facilitates analysis of the relative importance of its different components in the context of hydrological modelling, and show how model calibration problems can arise due to interactions among these components. The analysis is illustrated by calibrating a simple conceptual precipitation-runoff model to daily data for a number of Austrian basins having a broad range of hydro-meteorological characteristics. Evaluation of the results clearly demonstrates the problems that can be associated with any calibration based on the NSE (or MSE) criterion. While we propose and test an alternative criterion that can help to reduce model calibration problems, the primary purpose of this study is not to present an improved measure of model performance. Instead, we seek to show that there are systematic problems inherent with any optimization based on formulations related to the MSE. The analysis and results have implications to the manner in which we calibrate and evaluate environmental models; we discuss these and suggest possible ways forward that may move us towards an improved and diagnostically meaningful approach to model performance evaluation and identification
Estimating epistemic and aleatory uncertainties during hydrologic modeling: An information theoretic approach
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/98239/1/wrcr20161.pd
On the accurate estimation of information-theoretic quantities from multi-dimensional sample data
Using information-theoretic quantities in practical applications with continuous data is often hindered by the fact that probability density functions need to be estimated in higher dimensions, which can become unreliable or even computationally unfeasible. To make these useful quantities more accessible, alternative approaches such as binned frequencies using histograms and k -nearest neighbors ( k -NN) have been proposed. However, a systematic comparison of the applicability of these methods has been lacking. We wish to fill this gap by comparing kernel-density-based estimation (KDE) with these two alternatives in carefully designed synthetic test cases. Specifically, we wish to estimate the information-theoretic quantities: entropy, KullbackâLeibler divergence, and mutual information, from sample data. As a reference, the results are compared to closed-form solutions or numerical integrals. We generate samples from distributions of various shapes in dimensions ranging from one to ten. We evaluate the estimatorsâ performance as a function of sample size, distribution characteristics, and chosen hyperparameters. We further compare the required computation time and specific implementation challenges. Notably, k -NN estimation tends to outperform other methods, considering algorithmic implementation, computational efficiency, and estimation accuracy, especially with sufficient data. This study provides valuable insights into the strengths and limitations of the different estimation methods for information-theoretic quantities. It also highlights the significance of considering the characteristics of the data, as well as the targeted information-theoretic quantity when selecting an appropriate estimation technique. These findings will assist scientists and practitioners in choosing the most suitable method, considering their specific application and available data. We have collected the compared estimation methods in a ready-to-use open-source Python 3 toolbox and, thereby, hope to promote the use of information-theoretic quantities by researchers and practitioners to evaluate the information in data and models in various disciplines.We acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) both under Germanyâs Excellence StrategyâEXC 2075â390740016 and the project 507884992.Deutsche Forschungsgemeinschaft (DFG, German Research Foundation
- âŠ