99,147 research outputs found

    Random effects compound Poisson model to represent data with extra zeros

    Full text link
    This paper describes a compound Poisson-based random effects structure for modeling zero-inflated data. Data with large proportion of zeros are found in many fields of applied statistics, for example in ecology when trying to model and predict species counts (discrete data) or abundance distributions (continuous data). Standard methods for modeling such data include mixture and two-part conditional models. Conversely to these methods, the stochastic models proposed here behave coherently with regards to a change of scale, since they mimic the harvesting of a marked Poisson process in the modeling steps. Random effects are used to account for inhomogeneity. In this paper, model design and inference both rely on conditional thinking to understand the links between various layers of quantities : parameters, latent variables including random effects and zero-inflated observations. The potential of these parsimonious hierarchical models for zero-inflated data is exemplified using two marine macroinvertebrate abundance datasets from a large scale scientific bottom-trawl survey. The EM algorithm with a Monte Carlo step based on importance sampling is checked for this model structure on a simulated dataset : it proves to work well for parameter estimation but parameter values matter when re-assessing the actual coverage level of the confidence regions far from the asymptotic conditions.Comment: 4

    Fuzzy Supernova Templates I: Classification

    Full text link
    Modern supernova (SN) surveys are now uncovering stellar explosions at rates that far surpass what the world's spectroscopic resources can handle. In order to make full use of these SN datasets, it is necessary to use analysis methods that depend only on the survey photometry. This paper presents two methods for utilizing a set of SN light curve templates to classify SN objects. In the first case we present an updated version of the Bayesian Adaptive Template Matching program (BATM). To address some shortcomings of that strictly Bayesian approach, we introduce a method for Supernova Ontology with Fuzzy Templates (SOFT), which utilizes Fuzzy Set Theory for the definition and combination of SN light curve models. For well-sampled light curves with a modest signal to noise ratio (S/N>10), the SOFT method can correctly separate thermonuclear (Type Ia) SNe from core collapse SNe with 98% accuracy. In addition, the SOFT method has the potential to classify supernovae into sub-types, providing photometric identification of very rare or peculiar explosions. The accuracy and precision of the SOFT method is verified using Monte Carlo simulations as well as real SN light curves from the Sloan Digital Sky Survey and the SuperNova Legacy Survey. In a subsequent paper the SOFT method is extended to address the problem of parameter estimation, providing estimates of redshift, distance, and host galaxy extinction without any spectroscopy.Comment: 26 pages, 12 figures. Accepted to Ap

    Methods for Bayesian power spectrum inference with galaxy surveys

    Full text link
    We derive and implement a full Bayesian large scale structure inference method aiming at precision recovery of the cosmological power spectrum from galaxy redshift surveys. Our approach improves over previous Bayesian methods by performing a joint inference of the three dimensional density field, the cosmological power spectrum, luminosity dependent galaxy biases and corresponding normalizations. We account for all joint and correlated uncertainties between all inferred quantities. Classes of galaxies with different biases are treated as separate sub samples. The method therefore also allows the combined analysis of more than one galaxy survey. In particular, it solves the problem of inferring the power spectrum from galaxy surveys with non-trivial survey geometries by exploring the joint posterior distribution with efficient implementations of multiple block Markov chain and Hybrid Monte Carlo methods. Our Markov sampler achieves high statistical efficiency in low signal to noise regimes by using a deterministic reversible jump algorithm. We test our method on an artificial mock galaxy survey, emulating characteristic features of the Sloan Digital Sky Survey data release 7, such as its survey geometry and luminosity dependent biases. These tests demonstrate the numerical feasibility of our large scale Bayesian inference frame work when the parameter space has millions of dimensions. The method reveals and correctly treats the anti-correlation between bias amplitudes and power spectrum, which are not taken into account in current approaches to power spectrum estimation, a 20 percent effect across large ranges in k-space. In addition, the method results in constrained realizations of density fields obtained without assuming the power spectrum or bias parameters in advance

    Shrinkage Estimation of the Power Spectrum Covariance Matrix

    Full text link
    We seek to improve estimates of the power spectrum covariance matrix from a limited number of simulations by employing a novel statistical technique known as shrinkage estimation. The shrinkage technique optimally combines an empirical estimate of the covariance with a model (the target) to minimize the total mean squared error compared to the true underlying covariance. We test this technique on N-body simulations and evaluate its performance by estimating cosmological parameters. Using a simple diagonal target, we show that the shrinkage estimator significantly outperforms both the empirical covariance and the target individually when using a small number of simulations. We find that reducing noise in the covariance estimate is essential for properly estimating the values of cosmological parameters as well as their confidence intervals. We extend our method to the jackknife covariance estimator and again find significant improvement, though simulations give better results. Even for thousands of simulations we still find evidence that our method improves estimation of the covariance matrix. Because our method is simple, requires negligible additional numerical effort, and produces superior results, we always advocate shrinkage estimation for the covariance of the power spectrum and other large-scale structure measurements when purely theoretical modeling of the covariance is insufficient.Comment: 9 pages, 7 figures (1 new), MNRAS, accepted. Changes to match accepted version, including an additional explanatory section with 1 figur

    Evaluation of advanced optimisation methods for estimating Mixed Logit models

    No full text
    The performances of different simulation-based estimation techniques for mixed logit modeling are evaluated. A quasi-Monte Carlo method (modified Latin hypercube sampling) is compared with a Monte Carlo algorithm with dynamic accuracy. The classic Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm line-search approach and trust region methods, which have proved to be extremely powerful in nonlinear programming, are also compared. Numerical tests are performed on two real data sets: stated preference data for parking type collected in the United Kingdom, and revealed preference data for mode choice collected as part of a German travel diary survey. Several criteria are used to evaluate the approximation quality of the log likelihood function and the accuracy of the results and the associated estimation runtime. Results suggest that the trust region approach outperforms the BFGS approach and that Monte Carlo methods remain competitive with quasi-Monte Carlo methods in high-dimensional problems, especially when an adaptive optimization algorithm is used

    Subsampling MCMC - An introduction for the survey statistician

    Full text link
    The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called pseudo-marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version contained a bug in generating the figures and reference
    corecore