100,396 research outputs found

    Why your model parameter confidences might be too optimistic -- unbiased estimation of the inverse covariance matrix

    Get PDF
    AIMS. The maximum-likelihood method is the standard approach to obtain model fits to observational data and the corresponding confidence regions. We investigate possible sources of bias in the log-likelihood function and its subsequent analysis, focusing on estimators of the inverse covariance matrix. Furthermore, we study under which circumstances the estimated covariance matrix is invertible. METHODS. We perform Monte-Carlo simulations to investigate the behaviour of estimators for the inverse covariance matrix, depending on the number of independent data sets and the number of variables of the data vectors. RESULTS. We find that the inverse of the maximum-likelihood estimator of the covariance is biased, the amount of bias depending on the ratio of the number of bins (data vector variables), P, to the number of data sets, N. This bias inevitably leads to an -- in extreme cases catastrophic -- underestimation of the size of confidence regions. We report on a method to remove this bias for the idealised case of Gaussian noise and statistically independent data vectors. Moreover, we demonstrate that marginalisation over parameters introduces a bias into the marginalised log-likelihood function. Measures of the sizes of confidence regions suffer from the same problem. Furthermore, we give an analytic proof for the fact that the estimated covariance matrix is singular if P>N.Comment: 6 pages, 3 figures, A&A, in press, shortened versio

    Dimensionality reduction of clustered data sets

    Get PDF
    We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets

    Some Aspects of Measurement Error in Linear Regression of Astronomical Data

    Full text link
    I describe a Bayesian method to account for measurement errors in linear regression of astronomical data. The method allows for heteroscedastic and possibly correlated measurement errors, and intrinsic scatter in the regression relationship. The method is based on deriving a likelihood function for the measured data, and I focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussians. I generalize the method to incorporate multiple independent variables, non-detections, and selection effects (e.g., Malmquist bias). A Gibbs sampler is described for simulating random draws from the probability distribution of the parameters, given the observed data. I use simulation to compare the method with other common estimators. The simulations illustrate that the Gaussian mixture model outperforms other common estimators and can effectively give constraints on the regression parameters, even when the measurement errors dominate the observed scatter, source detection fraction is low, or the intrinsic distribution of the independent variables is not a mixture of Gaussians. I conclude by using this method to fit the X-ray spectral slope as a function of Eddington ratio using a sample of 39 z < 0.8 radio-quiet quasars. I confirm the correlation seen by other authors between the radio-quiet quasar X-ray spectral slope and the Eddington ratio, where the X-ray spectral slope softens as the Eddington ratio increases.Comment: 39 pages, 11 figures, 1 table, accepted by ApJ. IDL routines (linmix_err.pro) for performing the Markov Chain Monte Carlo are available at the IDL astronomy user's library, http://idlastro.gsfc.nasa.gov/homepage.htm

    Binary Models for Marginal Independence

    Full text link
    Log-linear models are a classical tool for the analysis of contingency tables. In particular, the subclass of graphical log-linear models provides a general framework for modelling conditional independences. However, with the exception of special structures, marginal independence hypotheses cannot be accommodated by these traditional models. Focusing on binary variables, we present a model class that provides a framework for modelling marginal independences in contingency tables. The approach taken is graphical and draws on analogies to multivariate Gaussian models for marginal independence. For the graphical model representation we use bi-directed graphs, which are in the tradition of path diagrams. We show how the models can be parameterized in a simple fashion, and how maximum likelihood estimation can be performed using a version of the Iterated Conditional Fitting algorithm. Finally we consider combining these models with symmetry restrictions

    High-Dimensional Bayesian Geostatistics

    Full text link
    With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. This article offers a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as "priors" for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has n\sim n floating point operations (flops), where nn the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings

    Simultaneous likelihood-based bootstrap confidence sets for a large number of models

    Get PDF
    The paper studies a problem of constructing simultaneous likelihood-based confidence sets. We consider a simultaneous multiplier bootstrap procedure for estimating the quantiles of the joint distribution of the likelihood ratio statistics, and for adjusting the confidence level for multiplicity. Theoretical results state the bootstrap validity in the following setting: the sample size nn is fixed, the maximal parameter dimension pmaxp_{\textrm{max}} and the number of considered parametric models KK are s.t. (logK)12pmax3/n(\log K)^{12}p_{\max}^{3}/n is small. We also consider the situation when the parametric models are misspecified. If the models' misspecification is significant, then the bootstrap critical values exceed the true ones and the simultaneous bootstrap confidence set becomes conservative. Numerical experiments for local constant and local quadratic regressions illustrate the theoretical results

    Bootstrap confidence sets under model misspecification

    Get PDF
    A multiplier bootstrap procedure for construction of likelihood-based confidence sets is considered for finite samples and a possible model misspecification. Theoretical results justify the bootstrap validity for a small or moderate sample size and allow to control the impact of the parameter dimension pp: the bootstrap approximation works if p3/np^3/n is small. The main result about bootstrap validity continues to apply even if the underlying parametric model is misspecified under the so-called small modelling bias condition. In the case when the true model deviates significantly from the considered parametric family, the bootstrap procedure is still applicable but it becomes a bit conservative: the size of the constructed confidence sets is increased by the modelling bias. We illustrate the results with numerical examples for misspecified linear and logistic regressions.Comment: Published at http://dx.doi.org/10.1214/15-AOS1355 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore