94,775 research outputs found

    Composite Likelihood Inference by Nonparametric Saddlepoint Tests

    Get PDF
    The class of composite likelihood functions provides a flexible and powerful toolkit to carry out approximate inference for complex statistical models when the full likelihood is either impossible to specify or unfeasible to compute. However, the strenght of the composite likelihood approach is dimmed when considering hypothesis testing about a multidimensional parameter because the finite sample behavior of likelihood ratio, Wald, and score-type test statistics is tied to the Godambe information matrix. Consequently inaccurate estimates of the Godambe information translate in inaccurate p-values. In this paper it is shown how accurate inference can be obtained by using a fully nonparametric saddlepoint test statistic derived from the composite score functions. The proposed statistic is asymptotically chi-square distributed up to a relative error of second order and does not depend on the Godambe information. The validity of the method is demonstrated through simulation studies

    The Topology of Large Scale Structure in the 1.2 Jy IRAS Redshift Survey

    Get PDF
    We measure the topology (genus) of isodensity contour surfaces in volume limited subsets of the 1.2 Jy IRAS redshift survey, for smoothing scales \lambda=4\hmpc, 7\hmpc, and 12\hmpc. At 12\hmpc, the observed genus curve has a symmetric form similar to that predicted for a Gaussian random field. At the shorter smoothing lengths, the observed genus curve shows a modest shift in the direction of an isolated cluster or ``meatball'' topology. We use mock catalogs drawn from cosmological N-body simulations to investigate the systematic biases that affect topology measurements in samples of this size and to determine the full covariance matrix of the expected random errors. We incorporate the error correlations into our evaluations of theoretical models, obtaining both frequentist assessments of absolute goodness-of-fit and Bayesian assessments of models' relative likelihoods. We compare the observed topology of the 1.2 Jy survey to the predictions of dynamically evolved, unbiased, gravitational instability models that have Gaussian initial conditions. The model with an n=1n=-1, power-law initial power spectrum achieves the best overall agreement with the data, though models with a low-density cold dark matter power spectrum and an n=0n=0 power-law spectrum are also consistent. The observed topology is inconsistent with an initially Gaussian model that has n=2n=-2, and it is strongly inconsistent with a Voronoi foam model, which has a non-Gaussian, bubble topology.Comment: ApJ submitted, 39 pages, LaTeX(aasms4), 12 figures, 1 Tabl

    Nonparametric Regression using the Concept of Minimum Energy

    Full text link
    It has recently been shown that an unbinned distance-based statistic, the energy, can be used to construct an extremely powerful nonparametric multivariate two sample goodness-of-fit test. An extension to this method that makes it possible to perform nonparametric regression using multiple multivariate data sets is presented in this paper. The technique, which is based on the concept of minimizing the energy of the system, permits determination of parameters of interest without the need for parametric expressions of the parent distributions of the data sets. The application and performance of this new method is discussed in the context of some simple example analyses.Comment: 10 pages, 4 figure

    Sparse Linear Identifiable Multivariate Modeling

    Full text link
    In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component delta-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identifiability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identifiable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure
    corecore