99,147 research outputs found
Random effects compound Poisson model to represent data with extra zeros
This paper describes a compound Poisson-based random effects structure for
modeling zero-inflated data. Data with large proportion of zeros are found in
many fields of applied statistics, for example in ecology when trying to model
and predict species counts (discrete data) or abundance distributions
(continuous data). Standard methods for modeling such data include mixture and
two-part conditional models. Conversely to these methods, the stochastic models
proposed here behave coherently with regards to a change of scale, since they
mimic the harvesting of a marked Poisson process in the modeling steps. Random
effects are used to account for inhomogeneity. In this paper, model design and
inference both rely on conditional thinking to understand the links between
various layers of quantities : parameters, latent variables including random
effects and zero-inflated observations. The potential of these parsimonious
hierarchical models for zero-inflated data is exemplified using two marine
macroinvertebrate abundance datasets from a large scale scientific bottom-trawl
survey. The EM algorithm with a Monte Carlo step based on importance sampling
is checked for this model structure on a simulated dataset : it proves to work
well for parameter estimation but parameter values matter when re-assessing the
actual coverage level of the confidence regions far from the asymptotic
conditions.Comment: 4
Fuzzy Supernova Templates I: Classification
Modern supernova (SN) surveys are now uncovering stellar explosions at rates
that far surpass what the world's spectroscopic resources can handle. In order
to make full use of these SN datasets, it is necessary to use analysis methods
that depend only on the survey photometry. This paper presents two methods for
utilizing a set of SN light curve templates to classify SN objects. In the
first case we present an updated version of the Bayesian Adaptive Template
Matching program (BATM). To address some shortcomings of that strictly Bayesian
approach, we introduce a method for Supernova Ontology with Fuzzy Templates
(SOFT), which utilizes Fuzzy Set Theory for the definition and combination of
SN light curve models. For well-sampled light curves with a modest signal to
noise ratio (S/N>10), the SOFT method can correctly separate thermonuclear
(Type Ia) SNe from core collapse SNe with 98% accuracy. In addition, the SOFT
method has the potential to classify supernovae into sub-types, providing
photometric identification of very rare or peculiar explosions. The accuracy
and precision of the SOFT method is verified using Monte Carlo simulations as
well as real SN light curves from the Sloan Digital Sky Survey and the
SuperNova Legacy Survey. In a subsequent paper the SOFT method is extended to
address the problem of parameter estimation, providing estimates of redshift,
distance, and host galaxy extinction without any spectroscopy.Comment: 26 pages, 12 figures. Accepted to Ap
Methods for Bayesian power spectrum inference with galaxy surveys
We derive and implement a full Bayesian large scale structure inference
method aiming at precision recovery of the cosmological power spectrum from
galaxy redshift surveys. Our approach improves over previous Bayesian methods
by performing a joint inference of the three dimensional density field, the
cosmological power spectrum, luminosity dependent galaxy biases and
corresponding normalizations. We account for all joint and correlated
uncertainties between all inferred quantities. Classes of galaxies with
different biases are treated as separate sub samples. The method therefore also
allows the combined analysis of more than one galaxy survey.
In particular, it solves the problem of inferring the power spectrum from
galaxy surveys with non-trivial survey geometries by exploring the joint
posterior distribution with efficient implementations of multiple block Markov
chain and Hybrid Monte Carlo methods. Our Markov sampler achieves high
statistical efficiency in low signal to noise regimes by using a deterministic
reversible jump algorithm. We test our method on an artificial mock galaxy
survey, emulating characteristic features of the Sloan Digital Sky Survey data
release 7, such as its survey geometry and luminosity dependent biases. These
tests demonstrate the numerical feasibility of our large scale Bayesian
inference frame work when the parameter space has millions of dimensions.
The method reveals and correctly treats the anti-correlation between bias
amplitudes and power spectrum, which are not taken into account in current
approaches to power spectrum estimation, a 20 percent effect across large
ranges in k-space. In addition, the method results in constrained realizations
of density fields obtained without assuming the power spectrum or bias
parameters in advance
Shrinkage Estimation of the Power Spectrum Covariance Matrix
We seek to improve estimates of the power spectrum covariance matrix from a
limited number of simulations by employing a novel statistical technique known
as shrinkage estimation. The shrinkage technique optimally combines an
empirical estimate of the covariance with a model (the target) to minimize the
total mean squared error compared to the true underlying covariance. We test
this technique on N-body simulations and evaluate its performance by estimating
cosmological parameters. Using a simple diagonal target, we show that the
shrinkage estimator significantly outperforms both the empirical covariance and
the target individually when using a small number of simulations. We find that
reducing noise in the covariance estimate is essential for properly estimating
the values of cosmological parameters as well as their confidence intervals. We
extend our method to the jackknife covariance estimator and again find
significant improvement, though simulations give better results. Even for
thousands of simulations we still find evidence that our method improves
estimation of the covariance matrix. Because our method is simple, requires
negligible additional numerical effort, and produces superior results, we
always advocate shrinkage estimation for the covariance of the power spectrum
and other large-scale structure measurements when purely theoretical modeling
of the covariance is insufficient.Comment: 9 pages, 7 figures (1 new), MNRAS, accepted. Changes to match
accepted version, including an additional explanatory section with 1 figur
Evaluation of advanced optimisation methods for estimating Mixed Logit models
The performances of different simulation-based estimation techniques for mixed logit modeling are evaluated. A quasi-Monte Carlo method (modified Latin hypercube sampling) is compared with a Monte Carlo algorithm with dynamic accuracy. The classic Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm line-search approach and trust region methods, which have proved to be extremely powerful in nonlinear programming, are also compared. Numerical tests are performed on two real data sets: stated preference data for parking type collected in the United Kingdom, and revealed preference data for mode choice collected as part of a German travel diary survey. Several criteria are used to evaluate the approximation quality of the log likelihood function and the accuracy of the results and the associated estimation runtime. Results suggest that the trust region approach outperforms the BFGS approach and that Monte Carlo methods remain competitive with quasi-Monte Carlo methods in high-dimensional problems, especially when an adaptive optimization algorithm is used
Subsampling MCMC - An introduction for the survey statistician
The rapid development of computing power and efficient Markov Chain Monte
Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics,
making it a highly practical inference method in applied work. However, MCMC
algorithms tend to be computationally demanding, and are particularly slow for
large datasets. Data subsampling has recently been suggested as a way to make
MCMC methods scalable on massively large data, utilizing efficient sampling
schemes and estimators from the survey sampling literature. These developments
tend to be unknown by many survey statisticians who traditionally work with
non-Bayesian methods, and rarely use MCMC. Our article explains the idea of
data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a
so called pseudo-marginal MCMC approach to speeding up MCMC through data
subsampling. The review is written for a survey statistician without previous
knowledge of MCMC methods since our aim is to motivate survey sampling experts
to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version
contained a bug in generating the figures and reference
- …