133 research outputs found

    State tagging for improved Earth and environmental data quality assurance

    Get PDF
    Environmental data allows us to monitor the constantly changing environment that we live in. It allows us to study trends and helps us to develop better models to describe processes in our environment and they, in turn, can provide information to improve management practices. To ensure that the data are reliable for analysis and interpretation, they must undergo quality assurance procedures. Such procedures generally include standard operating procedures during sampling and laboratory measurement (if applicable), as well as data validation upon entry to databases. The latter usually involves compliance (i.e., format) and conformity (i.e., value) checks that are most likely to be in the form of single parameter range tests. Such tests take no consideration of the system state at which each measurement is made, and provide the user with little contextual information on the probable cause for a measurement to be flagged out of range. We propose the use of data science techniques to tag each measurement with an identified system state. The term ā€œstateā€ here is defined loosely and they are identified using k-means clustering, an unsupervised machine learning method. The meaning of the states is open to specialist interpretation. Once the states are identified, state-dependent prediction intervals can be calculated for each observational variable. This approach provides the user with more contextual information to resolve out-of-range flags and derive prediction intervals for observational variables that considers the changes in system states. The users can then apply further analysis and filtering as they see fit. We illustrate our approach with two well-established long-term monitoring datasets in the UK: moth and butterfly data from the UK Environmental Change Network (ECN), and the UK CEH Cumbrian Lakes monitoring scheme. Our work contributes to the ongoing development of a better data science framework that allows researchers and other stakeholders to find and use the data they need more readily

    Predicting the habitat expansion of the invasive roach Rutilus rutilus (Actinopterygii, Cyprinidae), in Great Britain

    Get PDF
    The roach is influential ecologically and has a preference for water temperatures >12Ā°C. In this study, we attempted to predict its habitat expansion in response to global warming, hypothesing its increase in Great Britain. Historical data for air temperature over different time scales (annual, seasonal, monthly and daily) and for the presence of roach in Great Britain were used to create four Ecological Niche Models. Mean seasonal air temperature (EncRoach-S) was the best predictor. Using EncRoach-S, two future climate scenarios were tested: a sensitivity test (i.e. incrementally increasing temperature values by 1Ā°C), and using air temperature data from UKCIP 11-member ensemble of climate change projections for 2031ā€“2040, 2061ā€“2070 and 2091ā€“2100. Both approaches predicted an increase in habitat suitability in Great Britain with rising air temperatures but the extent of change differed for England, Wales and Scotland. In England, the rate of expansion was initially slow but rapidly increased mid-century leading to 88% coverage by the century end. In Wales, there was a greater increase by the century end and a similar trend in Scotland. This study supports the conjecture that a rise in air temperature over the next few decades will lead to an increase in potential roach habitat

    Quantifying uncertainty in land cover mappings: an adaptive approach to sampling reference data using Bayesian inference

    Get PDF
    Mappings play an important role in environmental science applications by allowing practitioners to monitor changes at national and global scales. Over the last decade, it has become increasingly popular to use satellite imagery data and machine learning techniques (MLTs) to construct such maps. Given the black-box nature of many of these MLTs though, quantifying uncertainty in these maps often relies on sampling reference data under stricter conditions. However, practical constraints can sampling such data expensive, which forces stakeholders to make a trade-off between the degree of uncertainty in predictions and the costs of collecting appropriately sampled reference data. Furthermore, quantifying any trade-off is often difficult, as it will depend on many interdependent factors that cannot be fully understood until more data is collected. This paper investigates how a combination of Bayesian inference and an adaptive approach to sampling reference data can offer a generalizable way of managing such trade-offs. The approach is illustrated and evaluated using a woodland mapping of England as a case study in which reference data is collected under constraints motivated by COVID-19 travel restrictions. The key findings of this paper are as follows: (a) an adaptive approach to sampling reference data allows an informed approach when quantifying this trade-off; and (b) Bayesian inference is naturally suited to adaptive sampling and can make use of Monte Carlo methods when dealing with more advanced problems and analytical techniques

    Technical note: A bootstrapped LOESS regression approach for comparing soil depth profiles

    Get PDF
    Understanding the consequences of different land uses for the soil system is important to make better informed decisions based on sustainability. The ability to assess change in soil properties, throughout the soil profile, is a critical step in this process. We present an approach to examine differences in soil depth profiles between land uses using bootstrapped LOESS regressions (BLRs). This non-parametric approach is data-driven, unconstrained by distributional model parameters and provides the ability to determine significant effects of land use at specific locations down a soil profile. We demonstrate an example of the BLR approach using data from a study examining the impacts of bioenergy land use change on soil organic carbon (SOC). While this straightforward non-parametric approach may be most useful in comparing SOC profiles between land uses, it can be applied to any soil property which has been measured at satisfactory resolution down the soil profile. It is hoped that further studies of land use and land management, based on new or existing data, can make use of this approach to examine differences in soil profiles

    Integration of ground survey and remote sensing derived data: producing robust indicators of habitat extent and condition

    Get PDF
    The availability of suitable habitat is a key predictor of the changing status of biodiversity. Quantifying habitat availability over large spatial scales is, however, challenging. Although remote sensing techniques have high spatial coverage, there is uncertainty associated with these estimates due to errors in classification. Alternatively, the extent of habitats can be estimated from groundā€based field survey. Financial and logistical constraints mean that onā€theā€ground surveys have much lower coverage, but they can produce much higher quality estimates of habitat extent in the areas that are surveyed. Here, we demonstrate a new combined model which uses both types of data to produce unified national estimates of the extent of four key habitats across Great Britain based on Countryside Survey and Land Cover Map. This approach considers that the true proportion of habitat per km2 (Zi) is unobserved, but both ground survey and remote sensing can be used to estimate Zi. The model allows the relationship between remote sensing data and Zi to be spatially biased while ground survey is assumed to be unbiased. Taking a statistical modelā€based approach to integrating field survey and remote sensing data allows for information on bias and precision to be captured and propagated such that estimates produced and parameters estimated are robust and interpretable. A simulation study shows that the combined model should perform best when error in the ground survey data is low. We use repeat surveys to parameterize the variance of ground survey data and demonstrate that error in this data source is small. The model produced revised national estimates of broadleaved woodland, arable land, bog, and fen, marsh and swamp extent across Britain in 2007

    Feather corticosterone content in predatory birds in relation to body condition and hepatic metal concentration

    Get PDF
    This study investigated the feasibility of measuring corticosterone in feathers from cryo-archived raptor specimens, in order to provide a retrospective assessment of the activity of the stress axis in relation to contaminant burden. Feather samples were taken from sparrowhawk Accipiter nisus, kestrel Falco tinnunculus, buzzard Buteo buteo, barn owl Tyto alba, and tawny owl Strix aluco and the variation in feather CORT concentrations with respect to species, age, sex, feather position, and body condition was assessed. In sparrowhawks only, variation in feather CORT content was compared with hepatic metal concentrations. For individuals, CORT concentration (pg mm-1) in adjacent primary flight feathers (P5 and P6), and left and right wing primaries (P5), was statistically indistinguishable. The lowest concentrations of CORT were found in sparrowhawk feathers and CORT concentrations did not vary systematically with age or sex for any species. Significant relationships between feather CORT content and condition were observed in only tawny owl and kestrel. In sparrowhawks, feather CORT concentration was found to be positively related to the hepatic concentrations of five metals (Cd, Mn, Co, Cu, Mo) and the metalloid As. There was also a negative relationship between measures of condition and total hepatic metal concentration in males. The results suggest that some factors affecting CORT uptake by feathers remain to be resolved but feather CORT content from archived specimens has the potential to provide a simple effects biomarker for exposure to environmental contaminants

    Dependence of ombrotrophic peat nitrogen on phosphorus and climate

    Get PDF
    Nitrogen (N) is a key, possibly limiting, nutrient in ombrotrophic peat ecosystems, and enrichment by pollutant N in atmospheric deposition (Ndep, g m-2 a-1) is of concern with regard to peatland damage. We collated data on the N content of surface (depth ā‰¤ 25 cm, mean 15 cm) ombrotrophic peat (Nsp) for 215 sites in the UK and 62 other sites around the world, including boreal, temperate and tropical locations (wider global data), and found Nsp to range from 0.5 % to 4%. We examined the dependences of Nsp on surface peat phosphorus (P) content (Psp), mean annual precipitation (MAP), mean annual temperature (MAT) and Ndep. Linear regression on individual independent variables showed highly significant (p < 0.001) correlations of Nsp with Psp (r2 = 0.23) and MAP (r2 = 0.14), and significant (p < 0.01) but weaker correlations with MAT (r2 = 0.03) and Ndep (r2 = 0.03). A multiple regression model using log-transformed values explained 36% of the variance of the UK data, 84% of the variance of the wider global data, and 47% of the variance of the combined data, all with high significance (p < 0.001). In all three cases, most of the variance was explained by Psp and MAP, but in view of a positive correlation between MAP and MAT for many of the sites, a role for MAT in controlling Nsp cannot be ruled out. There is little evidence for an effect of Ndep on Nsp. The results point to a key role of P in N fixation, and thereby C fixation, in ombrotrophic peats

    Is more data always better? A simulation study of benefits and limitations of integrated distribution models

    Get PDF
    Species distribution models are popular and widely applied ecological tools. Recent increases in data availability have led to opportunities and challenges for species distribution modelling. Each data source has different qualities, determined by how it was collected. As several data sources can inform on a single species, ecologists have often analysed just one of the data sources, but this loses information, as some data sources are discarded. Integrated distribution models (IDMs) were developed to enable inclusion of multiple datasets in a single model, whilst accounting for different data collection protocols. This is advantageous because it allows efficient use of all data available, can improve estimation and account for biases in data collection. What is not yet known is when integrating different data sources does not bring advantages. Here, for the first time, we explore the potential limits of IDMs using a simulation study integrating a spatially biased, opportunistic, presenceā€only dataset with a structured, presenceā€“absence dataset. We explore four scenarios based on real ecological problems; small sample sizes, low levels of detection probability, correlations between covariates and a lack of knowledge of the drivers of bias in data collection. For each scenario we ask; do we see improvements in parameter estimation or the accuracy of spatial pattern prediction in the IDM versus modelling either data source alone? We found integration alone was unable to correct for spatial bias in presenceā€only data. Including a covariate to explain bias or adding a flexible spatial term improved IDM performance beyond single dataset models, with the models including a flexible spatial term producing the most accurate and robust estimates. Increasing the sample size of presenceā€“absence data and having no correlated covariates also improved estimation. These results demonstrate under which conditions integrated models provide benefits over modelling single data sources

    Distribution of Ash trees (Fraxinus excelsior) in Countryside Survey data

    Get PDF
    Countryside Survey is a unique study or ā€˜auditā€™ of the natural resources of the UKā€™s countryside. The Survey has been carried out at regular intervals since 1978. The countryside is sampled and studied using rigorous scientific methods, allowing us to compare new results with those from previous surveys. In this way we can detect the gradual and subtle changes that occur in the UKā€™s countryside over time. This report provides estimates of the area of ash trees (Fraxinus excelsior) in Great Britain in woods <0.5ha in size, the number of individual ash trees, the extent of ash in linear features and the trends in ash distribution in fixed vegetation plots. The areal extent of other common Broadleaf tree species are also provided

    Distribution of oak trees (Quercus sp.) in GB and Wales

    Get PDF
    This report provides estimates of Oak trees (Quercus sp.) from two national surveys covering; 1) Great Britain (including Wales), (2007) and 2) Wales only, (2013-2016). Oak tree estimates were calculated for; all woodland, all woodland areas <0.5ha in size, individual Oak trees, the extent of Oak in linear features and Oak in fixed vegetation plots. The data used in this report comes from Countryside Survey (CS) which is a unique study or ā€˜auditā€™ of the natural resources of the UKā€™s countryside and the Glastir Monitoring and Evaluation project (GMEP) which collected data using a similar methodology to CS across Wales to determine the impacts of the Welsh agri-environment scheme ā€˜Glastirā€™. The Countryside Survey has been carried out at regular intervals since 1978. The countryside is mapped and sampled using rigorous scientific methods, allowing us to compare new results with those from previous surveys. In this way we can detect the gradual and subtle changes that occur in the UKā€™s countryside over time. GMEP was carried out between 2013 and 2016 and all of the data from this period has been amalgamated
    • ā€¦
    corecore