19 research outputs found

    State tagging for improved Earth and environmental data quality assurance

    Get PDF
    Environmental data allows us to monitor the constantly changing environment that we live in. It allows us to study trends and helps us to develop better models to describe processes in our environment and they, in turn, can provide information to improve management practices. To ensure that the data are reliable for analysis and interpretation, they must undergo quality assurance procedures. Such procedures generally include standard operating procedures during sampling and laboratory measurement (if applicable), as well as data validation upon entry to databases. The latter usually involves compliance (i.e., format) and conformity (i.e., value) checks that are most likely to be in the form of single parameter range tests. Such tests take no consideration of the system state at which each measurement is made, and provide the user with little contextual information on the probable cause for a measurement to be flagged out of range. We propose the use of data science techniques to tag each measurement with an identified system state. The term “state” here is defined loosely and they are identified using k-means clustering, an unsupervised machine learning method. The meaning of the states is open to specialist interpretation. Once the states are identified, state-dependent prediction intervals can be calculated for each observational variable. This approach provides the user with more contextual information to resolve out-of-range flags and derive prediction intervals for observational variables that considers the changes in system states. The users can then apply further analysis and filtering as they see fit. We illustrate our approach with two well-established long-term monitoring datasets in the UK: moth and butterfly data from the UK Environmental Change Network (ECN), and the UK CEH Cumbrian Lakes monitoring scheme. Our work contributes to the ongoing development of a better data science framework that allows researchers and other stakeholders to find and use the data they need more readily

    River reach-level machine learning estimation of nutrient concentrations in Great Britain

    Get PDF
    Nitrogen (N) and phosphorus (P) are essential nutrients necessary for plant growth and support life in aquatic ecosystems. However, excessive N and P can lead to algal blooms that deplete oxygen and lead to fish death and the release of toxins that are harmful to humans. Estimates of N and P levels in rivers are typically calculated at station or grid (>1 km) scale; therefore, it is difficult to visualise the evolution of water quality as water travels downstream. Using a high-resolution reach-scale river network and associating each reach with land cover fractions and catchment descriptors, we trained random forest models on aggregated data (2010–2020) from the Environmental Agency Open Water Quality Data Archive for 2,343 stations to predict long-term nitrate and orthophosphate concentrations at each river reach in Great Britain (GB). We separated the model training and predictions for different seasons to investigate the potential difference in feature importance. Our model predicted concentrations with an average testing coefficient of determination (R2) of 0.71 for nitrate and 0.58 for orthophosphate using 5-fold cross-validation. Our model showed slightly better performance for higher Strahler stream orders, highlighting the challenges of making predictions in small streams. Our results revealed that arable and horticultural land use is the strongest and most reliable predictor for nitrate, while floodplain extents and standard percentage runoff are stronger predictors for orthophosphate. Nationally, higher orthophosphate concentrations were observed in urbanised areas. This study shows how combining a river network model with machine learning can easily provide a river network understanding of the spatial distribution of water quality levels

    Ensemble Kalman inversion of induced polarization data

    Get PDF
    This paper explores the applicability of Ensemble Kalman Inversion (EKI) with level-set parameterization for solving geophysical inverse problems. In particular, we focus on its extension to induced polarization (IP) data with uncertainty quantification. IP data may provide rich information on characteristics of geological materials due to its sensitivity to characteristics of the pore-grain interface. In many IP studies, different geological units are juxtaposed and the goal is to delineate these units and obtain estimates of unit properties with uncertainty bounds. Conventional inversion of IP data does not resolve well sharp interfaces and tends to reduce and smooth resistivity variations, while not readily providing uncertainty estimates. Recently, it has been shown for DC resistivity that EKI is an efficient solver for inverse problems which provides uncertainty quantification, and its combination with level set parameterization can delineate arbitrary interfaces well. In this contribution, we demonstrate the extension of EKI to IP data using a sequential approach, where the mean field obtained from DC resistivity inversion is used as input for a separate phase angle inversion. We illustrate our workflow using a series of synthetic and field examples. Variations with uncertainty bounds in both DC resistivity and phase angles are recovered by EKI, which provides useful information for hydrogeological site characterization. While phase angles are less well-resolved than DC resistivity, partly due to their smaller range and higher percentage data errors, it complements DC resistivity for site characterization. Overall, EKI with level set parameterization provides a practical approach forward for efficient hydrogeophysical imaging under uncertainty

    River reach-level machine learning estimation of nutrient concentrations in Great Britain

    Get PDF
    Nitrogen (N) and phosphorus (P) are essential nutrients necessary for plant growth and support life in aquatic ecosystems. However, excessive N and P can lead to algal blooms that deplete oxygen and lead to fish death and the release of toxins that are harmful to humans. Estimates of N and P levels in rivers are typically calculated at station or grid (>1 km) scale; therefore, it is difficult to visualise the evolution of water quality as water travels downstream. Using a high-resolution reach-scale river network and associating each reach with land cover fractions and catchment descriptors, we trained random forest models on aggregated data (2010–2020) from the Environmental Agency Open Water Quality Data Archive for 2,343 stations to predict long-term nitrate and orthophosphate concentrations at each river reach in Great Britain (GB). We separated the model training and predictions for different seasons to investigate the potential difference in feature importance. Our model predicted concentrations with an average testing coefficient of determination (R2) of 0.71 for nitrate and 0.58 for orthophosphate using 5-fold cross-validation. Our model showed slightly better performance for higher Strahler stream orders, highlighting the challenges of making predictions in small streams. Our results revealed that arable and horticultural land use is the strongest and most reliable predictor for nitrate, while floodplain extents and standard percentage runoff are stronger predictors for orthophosphate. Nationally, higher orthophosphate concentrations were observed in urbanised areas. This study shows how combining a river network model with machine learning can easily provide a river network understanding of the spatial distribution of water quality levels

    Advancing reproducible research by publishing R markdown notebooks as interactive sandboxes using the learnr package

    Get PDF
    Various R packages and best practices have played a pivotal role to promote the Findability, Accessibility, Interoperability, and Reuse (FAIR) principles of open science. For example, (1) well-documented R scripts and notebooks with rich narratives are deposited at a trusted data centre, (2) R Markdown interactive notebooks can be run on-demand as a web service, and (3) R Shiny web apps provide nice user interfaces to explore research outputs. However, notebooks require users to go through the entire analysis, while Shiny apps do not expose the underlying code and require extra work for UI design. We propose using the learnr package to expose certain code chunks in R Markdown so that users can readily experiment with them in guided, editable, isolated, executable, and resettable code sandboxes. Our approach does not replace the existing use of notebooks and Shiny apps, but it adds another level of abstraction between them to promote reproducible science

    Deep learning integrating scale conversion and pedo-transfer function to avoid potential errors in cross-scale transfer

    Get PDF
    Pedo-transfer functions (PTFs) relate soil/landscape static properties to a wide range of model inputs (e.g., soil hydraulic parameters) that are essential to soil hydrological modeling. Combining PTFs and hydrological models is a powerful strategy allowing the use of soil/landscape static properties for the generalization of large-scale modeling. However, since the spatial scales of soil hydraulic parameters required for model inputs and soil/landscape static properties are often not identical, cross-scale transfer is required, which can be a significant source of errors. Here, we investigate uncertainties in cross-scale transfer and develop an approach that avoids them. The proposed method uses the convolutional neural network (CNN) as a cross-scale transfer approach to directly map soil/landscape static properties to soil hydraulic parameters across different spatial scales. The proposed CNN approach is applied under two different estimation strategies to invert the hydraulic parameters of a soil-water balance model and subsequently the quality of the parameters is assessed. Both synthetical and real-world results around the conterminous United States indicate that in general the employed end-to-end strategy is superior to the two-step strategy. The CNN-based integrated model successfully reduces potential errors in cross-scale transfer and can be applied to other areas lacking information on hydraulic parameters or observations. The proposed method can be extended to improve parameter estimation in earth system models and enhance our understanding of key hydrological processes

    Multi-product characterization of surface soil moisture drydowns in the UK

    Get PDF
    The persistence or memory of soil moisture (θ) after rainfall has substantial environmental implications. Much work has been done to study soil moisture drydown for in-situ and satellite data separately. In this work, we present a comparison of drydown characteristics across multiple UK soil moisture products, including satellite-merged (i.e. TCM), in-situ (i.e. COSMOS-UK), hydrological model (i.e. G2G), statistical model (i.e. SMUK) and land surface model (LSM) (i.e. CHESS) data. The drydown decay time scale (τ) for all gridded products are computed at an unprecedented resolution of 1-2 km, a scale relevant to weather and climate models. While their range of τ differ (except SMUK and CHESS are similar) due to differences such as sensing depths, their spatial patterns are correlated to land cover and soil types. We further analyse the occurrence of drydown events at COSMOS-UK sites. We show that soil moisture drydown regimes exhibit strong seasonal dependencies, whereby the soil dries out quicker in summer than winter. These seasonal dependencies are important to consider during model benchmarking and evaluation. We show that fitted τ based on COSMOS and LSM are well correlated, with a bias of lower τ for COSMOS. Our findings contribute to a growing body of literature to characterize τ, with the aim of developing a method to systematically validate model soil moisture products at a range of scales

    The relative importance of head, flux, and prior information in hydraulic tomography analysis

    Get PDF
    Using cross-correlation analysis, we demonstrate that flux measurements at observation locations during hydraulic tomography (HT) surveys carry nonredundant information about heterogeneity that are complementary to head measurements at the same locations. We then hypothesize that a joint interpretation of head and flux data, even when the same observation network as head has been used, can enhance the resolution of HT estimates. Subsequently, we use numerical experiments to test this hypothesis and investigate the impact of flux conditioning and prior information (such as correlation lengths and initial mean models (i.e., uniform mean or distributed means)) on the HT estimates of a nonstationary, layered medium. We find that the addition of flux conditioning to HT analysis improves the estimates in all of the prior models tested. While prior information on geologic structures could be useful, its influence on the estimates reduces as more nonredundant data (i.e., flux) are used in the HT analysis. Lastly, recommendations for conducting HT surveys and analysis are presented

    Ensemble Kalman inversion of induced polarization data

    No full text
    This paper explores the applicability of Ensemble Kalman Inversion (EKI) with level-set parameterization for solving geophysical inverse problems. In particular, we focus on its extension to induced polarization (IP) data with uncertainty quantification. IP data may provide rich information on characteristics of geological materials due to its sensitivity to characteristics of the pore-grain interface. In many IP studies, different geological units are juxtaposed and the goal is to delineate these units and obtain estimates of unit properties with uncertainty bounds. Conventional inversion of IP data does not resolve well sharp interfaces and tends to reduce and smooth resistivity variations, while not readily providing uncertainty estimates. Recently, it has been shown for DC resistivity that EKI is an efficient solver for inverse problems which provides uncertainty quantification, and its combination with level set parameterization can delineate arbitrary interfaces well. In this contribution, we demonstrate the extension of EKI to IP data using a sequential approach, where the mean field obtained from DC resistivity inversion is used as input for a separate phase angle inversion. We illustrate our workflow using a series of synthetic and field examples. Variations with uncertainty bounds in both DC resistivity and phase angles are recovered by EKI, which provides useful information for hydrogeological site characterization. While phase angles are less well-resolved than DC resistivity, partly due to their smaller range and higher percentage data errors, it complements DC resistivity for site characterization. Overall, EKI with level set parameterization provides a practical approach forward for efficient hydrogeophysical imaging under uncertainty
    corecore