94,775 research outputs found
Composite Likelihood Inference by Nonparametric Saddlepoint Tests
The class of composite likelihood functions provides a flexible and powerful
toolkit to carry out approximate inference for complex statistical models when
the full likelihood is either impossible to specify or unfeasible to compute.
However, the strenght of the composite likelihood approach is dimmed when
considering hypothesis testing about a multidimensional parameter because the
finite sample behavior of likelihood ratio, Wald, and score-type test
statistics is tied to the Godambe information matrix. Consequently inaccurate
estimates of the Godambe information translate in inaccurate p-values. In this
paper it is shown how accurate inference can be obtained by using a fully
nonparametric saddlepoint test statistic derived from the composite score
functions. The proposed statistic is asymptotically chi-square distributed up
to a relative error of second order and does not depend on the Godambe
information. The validity of the method is demonstrated through simulation
studies
The Topology of Large Scale Structure in the 1.2 Jy IRAS Redshift Survey
We measure the topology (genus) of isodensity contour surfaces in volume
limited subsets of the 1.2 Jy IRAS redshift survey, for smoothing scales
\lambda=4\hmpc, 7\hmpc, and 12\hmpc. At 12\hmpc, the observed genus
curve has a symmetric form similar to that predicted for a Gaussian random
field. At the shorter smoothing lengths, the observed genus curve shows a
modest shift in the direction of an isolated cluster or ``meatball'' topology.
We use mock catalogs drawn from cosmological N-body simulations to investigate
the systematic biases that affect topology measurements in samples of this size
and to determine the full covariance matrix of the expected random errors. We
incorporate the error correlations into our evaluations of theoretical models,
obtaining both frequentist assessments of absolute goodness-of-fit and Bayesian
assessments of models' relative likelihoods. We compare the observed topology
of the 1.2 Jy survey to the predictions of dynamically evolved, unbiased,
gravitational instability models that have Gaussian initial conditions. The
model with an , power-law initial power spectrum achieves the best
overall agreement with the data, though models with a low-density cold dark
matter power spectrum and an power-law spectrum are also consistent. The
observed topology is inconsistent with an initially Gaussian model that has
, and it is strongly inconsistent with a Voronoi foam model, which has a
non-Gaussian, bubble topology.Comment: ApJ submitted, 39 pages, LaTeX(aasms4), 12 figures, 1 Tabl
Nonparametric Regression using the Concept of Minimum Energy
It has recently been shown that an unbinned distance-based statistic, the
energy, can be used to construct an extremely powerful nonparametric
multivariate two sample goodness-of-fit test. An extension to this method that
makes it possible to perform nonparametric regression using multiple
multivariate data sets is presented in this paper. The technique, which is
based on the concept of minimizing the energy of the system, permits
determination of parameters of interest without the need for parametric
expressions of the parent distributions of the data sets. The application and
performance of this new method is discussed in the context of some simple
example analyses.Comment: 10 pages, 4 figure
Sparse Linear Identifiable Multivariate Modeling
In this paper we consider sparse and identifiable linear latent variable
(factor) and linear Bayesian network models for parsimonious analysis of
multivariate data. We propose a computationally efficient method for joint
parameter and model inference, and model comparison. It consists of a fully
Bayesian hierarchy for sparse models using slab and spike priors (two-component
delta-function and continuous mixtures), non-Gaussian latent factors and a
stochastic search over the ordering of the variables. The framework, which we
call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and
bench-marked on artificial and real biological data sets. SLIM is closest in
spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in
inference, Bayesian network structure learning and model comparison.
Experimentally, SLIM performs equally well or better than LiNGAM with
comparable computational complexity. We attribute this mainly to the stochastic
search strategy used, and to parsimony (sparsity and identifiability), which is
an explicit part of the model. We propose two extensions to the basic i.i.d.
linear framework: non-linear dependence on observed variables, called SNIM
(Sparse Non-linear Identifiable Multivariate modeling) and allowing for
correlations between latent variables, called CSLIM (Correlated SLIM), for the
temporal and/or spatial data. The source code and scripts are available from
http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure
- …