1,466 research outputs found

    Towards a unified approach to formal risk of bias assessments for causal and descriptive inference

    Full text link
    Statistics is sometimes described as the science of reasoning under uncertainty. Statistical models provide one view of this uncertainty, but what is frequently neglected is the invisible portion of uncertainty: that assumed not to exist once a model has been fitted to some data. Systematic errors, i.e. bias, in data relative to some model and inferential goal can seriously undermine research conclusions, and qualitative and quantitative techniques have been created across several disciplines to quantify and generally appraise such potential biases. Perhaps best known are so-called risk of bias assessment instruments used to investigate the likely quality of randomised controlled trials in medical research. However, the logic of assessing the risks caused by various types of systematic error to statistical arguments applies far more widely. This logic applies even when statistical adjustment strategies for potential biases are used, as these frequently make assumptions (e.g. data missing at random) that can never be guaranteed in finite samples. Mounting concern about such situations can be seen in the increasing calls for greater consideration of biases caused by nonprobability sampling in descriptive inference (i.e. survey sampling), and the statistical generalisability of in-sample causal effect estimates in causal inference; both of which relate to the consideration of model-based and wider uncertainty when presenting research conclusions from models. Given that model-based adjustments are never perfect, we argue that qualitative risk of bias reporting frameworks for both descriptive and causal inferential arguments should be further developed and made mandatory by journals and funders. It is only through clear statements of the limits to statistical arguments that consumers of research can fully judge their value for any specific application.Comment: 12 page

    We need to talk about nonprobability samples

    Get PDF
    In most circumstances, probability sampling is the only way to ensure unbiased inference about population quantities where a complete census is not possible. As we enter the era of ‘big data’, however, nonprobability samples, whose sampling mechanisms are unknown, are undergoing a renaissance. We explain why the use of nonprobability samples can lead to spurious conclusions, and why seemingly large nonprobability samples can be (effectively) very small. We also review some recent controversies surrounding the use of nonprobability samples in biodiversity monitoring. These points notwithstanding, we argue that nonprobability samples can be useful, provided that their limitations are assessed, mitigated where possible and clearly communicated. Ecologists can learn much from other disciplines on each of these fronts

    Descriptive inference using large, unrepresentative nonprobability samples: an introduction for ecologists

    Get PDF
    Biodiversity monitoring usually involves drawing inferences about some variable of interest across a defined landscape from observations made at a sample of locations within that landscape. If the variable of interest differs between sampled and non-sampled locations, and no mitigating action is taken, then the sample is unrepresentative and inferences drawn from it will be biased. It is possible to adjust unrepresentative samples so that they more closely resemble the wider landscape in terms of “auxiliary variables”. A good auxiliary variable is a common cause of sample inclusion and the variable of interest, and if it explains an appreciable portion of the variance in both, then inferences drawn from the adjusted sample will be closer to the truth. We applied six types of survey sample adjustment—subsampling, quasi-randomisation, poststratification, superpopulation modelling, a “doubly robust” procedure, and multilevel regression and poststratification—to a simple two-part biodiversity monitoring problem. The first part was to estimate mean occupancy of the plant Calluna vulgaris in Great Britain in two time-periods (1987-1999 and 2010-2019); the second was to estimate the difference between the two (i.e. the trend). We estimated the means and trend using large, but (originally) unrepresentative, samples from a citizen science dataset. Compared to the unadjusted estimates, the means and trends estimated using most adjustment methods were more accurate, although standard uncertainty intervals generally did not cover the true values. Completely unbiased inference is not possible from an unrepresentative sample without knowing and having data on all relevant auxiliary variables. Adjustments can reduce the bias if auxiliary variables are available and selected carefully, but the potential for residual bias should be acknowledged and reported

    occAssess: an R package for assessing potential biases in species occurrence data

    Get PDF
    Species occurrence records from a variety of sources are increasingly aggregated into heterogeneous databases and made available to ecologists for immediate analytical use. However, these data are typically biased, i.e. they are not a probability sample of the target population of interest, meaning that the information they provide may not be an accurate reflection of reality. It is therefore crucial that species occurrence data are properly scrutinised before they are used for research. In this article, we introduce occAssess, an R package that enables straightforward screening of species occurrence data for potential biases. The package contains a number of discrete functions, each of which returns a measure of the potential for bias in one or more of the taxonomic, temporal, spatial, and environmental dimensions. Users can opt to provide a set of time periods into which the data will be split; in this case separate outputs will be provided for each period, making the package particularly useful for assessing the suitability of a dataset for estimating temporal trends in species' distributions. The outputs are provided visually (as ggplot2 objects) and do not include a formal recommendation as to whether data are of sufficient quality for any given inferential use. Instead, they should be used as ancillary information and viewed in the context of the question that is being asked, and the methods that are being used to answer it. We demonstrate the utility of occAssess by applying it to data on two key pollinator taxa in South America: leaf-nosed bats (Phyllostomidae) and hoverflies (Syrphidae). In this worked example, we briefly assess the degree to which various aspects of data coverage appear to have changed over time. We then discuss additional applications of the package, highlight its limitations, and point to future development opportunities

    On the trade-off between accuracy and spatial resolution when estimating species occupancy from geographically biased samples

    Get PDF
    Species occupancy is often defined as the proportion of areal units (sites) in a landscape that the focal species occupies, but it is usually estimated from the subset of sites that have been sampled. Assuming no measurement error, we show that three quantities–the degree of sampling bias (in terms of site selection), the proportion of sites that have been sampled and the variability of true occupancy across sites–determine the extent to which a sample-based estimate of occupancy differs from its true value across the wider landscape. That these are the only three quantities (measurement error notwithstanding) to affect the accuracy of estimates of species occupancy is the fundamental insight of the “Meng equation”, an algebraic re-expression of statistical error. We use simulations to show how each of the three quantities vary with the spatial resolution of the analysis and that absolute estimation error is lower at coarser resolutions. Absolute error scales similarly with resolution regardless of the size and clustering of the virtual species’ distribution. Finely resolved estimates of species occupancy have the potential to be more useful than coarse ones, but this potential is only realised if the estimates are at least reasonably accurate. Consequently, wherever there is the potential for sampling bias, there is a trade-off between spatial resolution and accuracy, and the Meng equation provides a theoretical framework in which analysts can consider the balance between the two. An obvious next step is to consider the implications of the Meng equation for estimating a time trend in species occupancy, where it is the confounding of error and true change that is of most interest

    Protected areas support more species than unprotected areas in Great Britain, but lose them equally rapidly

    Get PDF
    Protected areas are a key conservation tool, yet their effectiveness at maintaining biodiversity through time is rarely quantified. Here, we assess protected area effectiveness across sampled portions of Great Britain (primarily England) using regionalized (protected vs unprotected areas) Bayesian occupancy-detection models for 1238 invertebrate species at 1 km resolution, based on ~1 million occurrence records between 1990 and 2018. We quantified species richness, species trends, and compositional change (temporal beta diversity; decomposed into losses and gains). We report results overall, for two functional groups (pollinators and predators), and for rare and common species. Whilst we found that protected areas have 15 % more species on average than unprotected ones, declines in occupancy are of similar magnitude and species composition has changed 27 % across protected and unprotected areas, with losses dominating gains. Pollinators have suffered particularly severe declines. Still, protected areas are colonized by more locally-novel pollinator species than unprotected areas, suggesting that they might act as ‘landing pads’ for range-shifting pollinators. We find almost double the number of rare species in protected areas (although rare species trends are similar in protected and unprotected areas); whereas we uncover disproportionately steep declines for common species within protected areas. Our results highlight strong invertebrate reorganization and loss across both protected and unprotected areas. We therefore call for more effective protected areas, in combination with wider action, to bend the curve of biodiversity loss – where we provide a toolkit to quantify effectiveness. We must grasp the opportunity to effectively conserve biodiversity through time

    An evidence‐base for developing ambitious yet realistic national biodiversity targets

    Get PDF
    Biodiversity targets are a key tool, used at a global and national policy level, to align biodiversity goals, promote conservation action, and recover nature. Yet most biodiversity targets are not met. In England, the government has committed to legally-binding targets to halt and recover the decline in species abundance by 2030 and 2042. We present evidence from recent population trends of 670 terrestrial animal species (for which abundance time series are available) as a species abundance indicator, together with a synthesis of case studies on species recovery, to assess the degree to which these targets are achievable. The case studies demonstrate that recovery is possible through a range of approaches. The indicator demonstrates that theoretically targets can be achieved by addressing severe declines in a relatively small number of species, as well as creating smaller benefits for many species through landscape-scale interventions. The fact that multiple pathways exist to achieve the species abundance targets in England presents choices but also raises the possibility that targets might be reached with perverse consequences. We demonstrate that evidence on achievability is a necessary but not sufficient condition for determining what is required to deliver conservation outcomes and restore biodiversity

    A Path Toward the Use of Trail Users’ Tweets to Assess Effectiveness of the Environmental Stewardship Scheme: An Exploratory Analysis of the Pennine Way National Trail

    Get PDF
    Large and unofficial data sets, for instance those gathered from social media, are increasingly being used in geographical research and explored as decision support tools for policy development. Social media data have the potential to provide new insight into phenomena about which there is little information from conventional sources. Within this context, this paper explores the potential of social media data to evaluate the aesthetic management of landscape. Specifically, this project utilises the perceptions of visitors to the Pennine Way National Trail, which passes through land managed under the Environmental Stewardship Scheme (ESS). The method analyses sentiment in trail users’ public Twitter messages (tweets) with the aim of assessing the extent to which the ESS maintains landscape character within the trail corridor. The method demonstrates the importance of filtering social media data to convert it into useful information. After filtering, the results are based on 161 messages directly related to the trail. Although small, this sample illustrates the potential for social media to be used as a cheap and increasingly abundant source of information. We suggest that social media data in this context should be seen as a resource that can complement, rather than replace, conventional data sources such as questionnaires and interviews. Furthermore, we provide guidance on how social media could be effectively used by conservation bodies, such as Natural England, which are charged with the management of areas of environmental value worldwide

    On-the-fly selection of cell-specific enhancers, genes, miRNAs and proteins across the human body using SlideBase

    Get PDF
    Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS. Database URL: http://slidebase.binf.ku.d
    • …
    corecore