1,466 research outputs found
Towards a unified approach to formal risk of bias assessments for causal and descriptive inference
Statistics is sometimes described as the science of reasoning under
uncertainty. Statistical models provide one view of this uncertainty, but what
is frequently neglected is the invisible portion of uncertainty: that assumed
not to exist once a model has been fitted to some data. Systematic errors, i.e.
bias, in data relative to some model and inferential goal can seriously
undermine research conclusions, and qualitative and quantitative techniques
have been created across several disciplines to quantify and generally appraise
such potential biases. Perhaps best known are so-called risk of bias assessment
instruments used to investigate the likely quality of randomised controlled
trials in medical research. However, the logic of assessing the risks caused by
various types of systematic error to statistical arguments applies far more
widely. This logic applies even when statistical adjustment strategies for
potential biases are used, as these frequently make assumptions (e.g. data
missing at random) that can never be guaranteed in finite samples. Mounting
concern about such situations can be seen in the increasing calls for greater
consideration of biases caused by nonprobability sampling in descriptive
inference (i.e. survey sampling), and the statistical generalisability of
in-sample causal effect estimates in causal inference; both of which relate to
the consideration of model-based and wider uncertainty when presenting research
conclusions from models. Given that model-based adjustments are never perfect,
we argue that qualitative risk of bias reporting frameworks for both
descriptive and causal inferential arguments should be further developed and
made mandatory by journals and funders. It is only through clear statements of
the limits to statistical arguments that consumers of research can fully judge
their value for any specific application.Comment: 12 page
We need to talk about nonprobability samples
In most circumstances, probability sampling is the only way to ensure unbiased inference about population quantities where a complete census is not possible. As we enter the era of âbig dataâ, however, nonprobability samples, whose sampling mechanisms are unknown, are undergoing a renaissance. We explain why the use of nonprobability samples can lead to spurious conclusions, and why seemingly large nonprobability samples can be (effectively) very small. We also review some recent controversies surrounding the use of nonprobability samples in biodiversity monitoring. These points notwithstanding, we argue that nonprobability samples can be useful, provided that their limitations are assessed, mitigated where possible and clearly communicated. Ecologists can learn much from other disciplines on each of these fronts
Descriptive inference using large, unrepresentative nonprobability samples: an introduction for ecologists
Biodiversity monitoring usually involves drawing inferences about some variable of interest across a defined landscape from observations made at a sample of locations within that landscape. If the variable of interest differs between sampled and non-sampled locations, and no mitigating action is taken, then the sample is unrepresentative and inferences drawn from it will be biased. It is possible to adjust unrepresentative samples so that they more closely resemble the wider landscape in terms of âauxiliary variablesâ. A good auxiliary variable is a common cause of sample inclusion and the variable of interest, and if it explains an appreciable portion of the variance in both, then inferences drawn from the adjusted sample will be closer to the truth. We applied six types of survey sample adjustmentâsubsampling, quasi-randomisation, poststratification, superpopulation modelling, a âdoubly robustâ procedure, and multilevel regression and poststratificationâto a simple two-part biodiversity monitoring problem. The first part was to estimate mean occupancy of the plant Calluna vulgaris in Great Britain in two time-periods (1987-1999 and 2010-2019); the second was to estimate the difference between the two (i.e. the trend). We estimated the means and trend using large, but (originally) unrepresentative, samples from a citizen science dataset. Compared to the unadjusted estimates, the means and trends estimated using most adjustment methods were more accurate, although standard uncertainty intervals generally did not cover the true values. Completely unbiased inference is not possible from an unrepresentative sample without knowing and having data on all relevant auxiliary variables. Adjustments can reduce the bias if auxiliary variables are available and selected carefully, but the potential for residual bias should be acknowledged and reported
occAssess: an R package for assessing potential biases in species occurrence data
Species occurrence records from a variety of sources are increasingly aggregated into heterogeneous databases and made available to ecologists for immediate analytical use. However, these data are typically biased, i.e. they are not a probability sample of the target population of interest, meaning that the information they provide may not be an accurate reflection of reality. It is therefore crucial that species occurrence data are properly scrutinised before they are used for research. In this article, we introduce occAssess, an R package that enables straightforward screening of species occurrence data for potential biases. The package contains a number of discrete functions, each of which returns a measure of the potential for bias in one or more of the taxonomic, temporal, spatial, and environmental dimensions. Users can opt to provide a set of time periods into which the data will be split; in this case separate outputs will be provided for each period, making the package particularly useful for assessing the suitability of a dataset for estimating temporal trends in species' distributions. The outputs are provided visually (as ggplot2 objects) and do not include a formal recommendation as to whether data are of sufficient quality for any given inferential use. Instead, they should be used as ancillary information and viewed in the context of the question that is being asked, and the methods that are being used to answer it. We demonstrate the utility of occAssess by applying it to data on two key pollinator taxa in South America: leaf-nosed bats (Phyllostomidae) and hoverflies (Syrphidae). In this worked example, we briefly assess the degree to which various aspects of data coverage appear to have changed over time. We then discuss additional applications of the package, highlight its limitations, and point to future development opportunities
On the trade-off between accuracy and spatial resolution when estimating species occupancy from geographically biased samples
Species occupancy is often defined as the proportion of areal units (sites) in a landscape that the focal species occupies, but it is usually estimated from the subset of sites that have been sampled. Assuming no measurement error, we show that three quantitiesâthe degree of sampling bias (in terms of site selection), the proportion of sites that have been sampled and the variability of true occupancy across sitesâdetermine the extent to which a sample-based estimate of occupancy differs from its true value across the wider landscape. That these are the only three quantities (measurement error notwithstanding) to affect the accuracy of estimates of species occupancy is the fundamental insight of the âMeng equationâ, an algebraic re-expression of statistical error. We use simulations to show how each of the three quantities vary with the spatial resolution of the analysis and that absolute estimation error is lower at coarser resolutions. Absolute error scales similarly with resolution regardless of the size and clustering of the virtual speciesâ distribution. Finely resolved estimates of species occupancy have the potential to be more useful than coarse ones, but this potential is only realised if the estimates are at least reasonably accurate. Consequently, wherever there is the potential for sampling bias, there is a trade-off between spatial resolution and accuracy, and the Meng equation provides a theoretical framework in which analysts can consider the balance between the two. An obvious next step is to consider the implications of the Meng equation for estimating a time trend in species occupancy, where it is the confounding of error and true change that is of most interest
Protected areas support more species than unprotected areas in Great Britain, but lose them equally rapidly
Protected areas are a key conservation tool, yet their effectiveness at maintaining biodiversity through time is rarely quantified. Here, we assess protected area effectiveness across sampled portions of Great Britain (primarily England) using regionalized (protected vs unprotected areas) Bayesian occupancy-detection models for 1238 invertebrate species at 1 km resolution, based on ~1 million occurrence records between 1990 and 2018. We quantified species richness, species trends, and compositional change (temporal beta diversity; decomposed into losses and gains). We report results overall, for two functional groups (pollinators and predators), and for rare and common species. Whilst we found that protected areas have 15 % more species on average than unprotected ones, declines in occupancy are of similar magnitude and species composition has changed 27 % across protected and unprotected areas, with losses dominating gains. Pollinators have suffered particularly severe declines. Still, protected areas are colonized by more locally-novel pollinator species than unprotected areas, suggesting that they might act as âlanding padsâ for range-shifting pollinators. We find almost double the number of rare species in protected areas (although rare species trends are similar in protected and unprotected areas); whereas we uncover disproportionately steep declines for common species within protected areas. Our results highlight strong invertebrate reorganization and loss across both protected and unprotected areas. We therefore call for more effective protected areas, in combination with wider action, to bend the curve of biodiversity loss â where we provide a toolkit to quantify effectiveness. We must grasp the opportunity to effectively conserve biodiversity through time
An evidenceâbase for developing ambitious yet realistic national biodiversity targets
Biodiversity targets are a key tool, used at a global and national policy level, to align biodiversity goals, promote conservation action, and recover nature. Yet most biodiversity targets are not met. In England, the government has committed to legally-binding targets to halt and recover the decline in species abundance by 2030 and 2042. We present evidence from recent population trends of 670 terrestrial animal species (for which abundance time series are available) as a species abundance indicator, together with a synthesis of case studies on species recovery, to assess the degree to which these targets are achievable. The case studies demonstrate that recovery is possible through a range of approaches. The indicator demonstrates that theoretically targets can be achieved by addressing severe declines in a relatively small number of species, as well as creating smaller benefits for many species through landscape-scale interventions. The fact that multiple pathways exist to achieve the species abundance targets in England presents choices but also raises the possibility that targets might be reached with perverse consequences. We demonstrate that evidence on achievability is a necessary but not sufficient condition for determining what is required to deliver conservation outcomes and restore biodiversity
A Path Toward the Use of Trail Usersâ Tweets to Assess Effectiveness of the Environmental Stewardship Scheme: An Exploratory Analysis of the Pennine Way National Trail
Large and unofficial data sets, for instance those gathered from social media, are increasingly being used in geographical research and explored as decision support tools for policy development. Social media data have the potential to provide new insight into phenomena about which there is little information from conventional sources. Within this context, this paper explores the potential of social media data to evaluate the aesthetic management of landscape. Specifically, this project utilises the perceptions of visitors to the Pennine Way National Trail, which passes through land managed under the Environmental Stewardship Scheme (ESS). The method analyses sentiment in trail usersâ public Twitter messages (tweets) with the aim of assessing the extent to which the ESS maintains landscape character within the trail corridor. The method demonstrates the importance of filtering social media data to convert it into useful information. After filtering, the results are based on 161 messages directly related to the trail. Although small, this sample illustrates the potential for social media to be used as a cheap and increasingly abundant source of information. We suggest that social media data in this context should be seen as a resource that can complement, rather than replace, conventional data sources such as questionnaires and interviews. Furthermore, we provide guidance on how social media could be effectively used by conservation bodies, such as Natural England, which are charged with the management of areas of environmental value worldwide
On-the-fly selection of cell-specific enhancers, genes, miRNAs and proteins across the human body using SlideBase
Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS. Database URL: http://slidebase.binf.ku.d
- âŚ