Using measurements close to a detection limit in a geostatistical case study to predict selenium concentration in topsoil

Abstract

Data on environmental variables are subject to measurement error (ME), and it is important that this ME should be considered in any statistical analysis. Environmental datasets commonly consist of positive random variables that have skewed distributions. Measurements are then usually reported with a theoretical detection limit (DL); measurements less than this DL are deemed not to be statistically different from zero, and these data are then treated by setting them to an arbitrary value of half of the DL. The skew of the data is dealt with by taking logarithms, and the geostatistical analysis performed for the transformed variable. The DL approach, however, is somewhat ad hoc, and in this paper we investigate an alternative approach to incorporate such measurements in a geostatistical analysis, namely Bayesian hierarchical modelling. This approach incorporates ‘soft’ data (i.e., imprecise information), and we use soft data to represent the information that each measurement provides. We can use this approach to combine a lognormal model to describe the spatial variability with a Gaussian model for the measurement error. We apply the methods to a dataset on the selenium (Se) concentration in the topsoil throughout the East Anglia region of the UK. We compare the maps of predictions produced by the approaches, and compare the methods based on their ability to predict the Se concentration and the associated uncertainty.We also consider how the geostatistical predictions might be used to aid the effective management of Se-deficient soils, and compare the methods based on the costs that might be incurred from the selected management strategies. We found that the Bayesian approach based on soft data resulted in smoother maps, reduced the errors of the predictions, and provided a better representation of the associated uncertainty. The cost resulting from Se-deficient soils was generally lower when we used the soft data approach, and we conclude that this provides a more effective and interpretable model for the data in this case study, and possibly for other environmental datasets with measurements close to a DL

Similar works

This paper was published in NERC Open Research Archive.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.