37,190 research outputs found

    Bayesian logistic regression for presence-only data

    Get PDF
    Presence-only data are referred to situations in which a censoring mechanism acts on a binary response which can be partially observed only with respect to one outcome, usually denoting the \textit{presence} of an attribute of interest. A typical example is the recording of species presence in ecological surveys. In this work a Bayesian approach to the analysis of presence-only data based on a two levels scheme is presented. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to censoring and the other one due to sampling. In the paper, through the use of a stratified sampling design with non-overlapping strata, a new formulation of the logistic model for presence-only data is proposed. In particular, the logistic regression with linear predictor is considered. Estimation is carried out with a new Markov Chain Monte Carlo algorithm with data augmentation, which does not require the a priori knowledge of the population prevalence. The performance of the new algorithm is validated by means of extensive simulation experiments using three scenarios and comparison with optimal benchmarks. An application to data existing in literature is reported in order to discuss the model behaviour in real world situations together with the results of an original study on termites occurrences data

    PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation

    Full text link
    Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. This allows for the interactive data exploration of the largest datasets. In this paper we introduce the first framework for parallel online aggregation in which the estimation virtually does not incur any overhead on top of the actual execution. We define a generic interface to express any estimation model that abstracts completely the execution details. We design a novel estimator specifically targeted at parallel online aggregation. When executed by the framework over a massive 8TB8\text{TB} TPC-H instance, the estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and without incurring overhead.Comment: 36 page

    A simple variance estimator of change for rotating repeated surveys: an application to the EU-SILC household surveys

    No full text
    A common problem is to compare two cross-sectional estimates for the same study variable taken on two different waves or occasions, and to judge whether the change observed is statistically significant. This involves the estimation of the sampling variance of the estimator of change. The estimation of this variance would be relatively straightforward if cross-sectional estimates were based on the same sample. Unfortunately, samples are not completely overlapping, because of rotations used in repeated surveys. We propose a simple approach based on a multivariate (general) linear regression model. The variance estimator proposed is not a model-based estimator. We show that the estimator proposed is design consistent when the sampling fractions are negligible. It can accommodate stratified and two-stage sampling designs. The main advantage of the approach proposed is its simplicity and flexibility. It can be applied to a wide class of sampling designs and can be implemented with standard statistical regression techniques. Because of its flexibility, the approach proposed is well suited for the estimation of variance for the European Union Statistics on Income and Living Conditions surveys. It allows us to use a common approach for variance estimation for the different types of design. The approach proposed is a useful tool, because it involves only modelling skills and requires limited knowledge of survey sampling theory

    Post-drought decline of the Amazon carbon sink

    Get PDF
    Amazon forests have experienced frequent and severe droughts in the past two decades. However, little is known about the large-scale legacy of droughts on carbon stocks and dynamics of forests. Using systematic sampling of forest structure measured by LiDAR waveforms from 2003 to 2008, here we show a significant loss of carbon over the entire Amazon basin at a rate of 0.3 ± 0.2 (95% CI) PgC yr−1 after the 2005 mega-drought, which continued persistently over the next 3 years (2005–2008). The changes in forest structure, captured by average LiDAR forest height and converted to above ground biomass carbon density, show an average loss of 2.35 ± 1.80 MgC ha−1 a year after (2006) in the epicenter of the drought. With more frequent droughts expected in future, forests of Amazon may lose their role as a robust sink of carbon, leading to a significant positive climate feedback and exacerbating warming trends.The research was partially supported by NASA Terrestrial Ecology grant at the Jet Propulsion Laboratory, California Institute of Technology and partial funding to the UCLA Institute of Environment and Sustainability from previous National Aeronautics and Space Administration and National Science Foundation grants. The authors thank NSIDC, BYU, USGS, and NASA Land Processes Distributed Active Archive Center (LP DAAC) for making their data available. (NASA Terrestrial Ecology grant at the Jet Propulsion Laboratory, California Institute of Technology)Published versio

    A Grouping Genetic Algorithm for Joint Stratification and Sample Allocation Designs

    Full text link
    Predicting the cheapest sample size for the optimal stratification in multivariate survey design is a problem in cases where the population frame is large. A solution exists that iteratively searches for the minimum sample size necessary to meet accuracy constraints in partitions of atomic strata created by the Cartesian product of auxiliary variables into larger strata. The optimal stratification can be found by testing all possible partitions. However the number of possible partitions grows exponentially with the number of initial strata. There are alternative ways of modelling this problem, one of the most natural is using Genetic Algorithms (GA). These evolutionary algorithms use recombination, mutation and selection to search for optimal solutions. They often converge on optimal or near-optimal solution more quickly than exact methods. We propose a new GA approach to this problem using grouping genetic operators instead of traditional operators. The results show a significant improvement in solution quality for similar computational effort, corresponding to large monetary savings.Comment: 22 page
    corecore