75,754 research outputs found

    Covariance Estimation: The GLM and Regularization Perspectives

    Get PDF
    Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification

    Full text link
    Objective. The main goal of this work is to develop a model for multi-sensor signals such as MEG or EEG signals, that accounts for the inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI type experiments. Approach. The method involves linear mixed effects statistical model, wavelet transform and spatial filtering, and aims at the characterization of localized discriminant features in multi-sensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e. discriminant) and background noise, using a very simple Gaussian linear mixed model. Main results. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data, in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. Significance. The combination of linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves on earlier results on similar problems, and the three main ingredients all play an important role

    Distinguishing niche and neutral processes: issues in variation partitioning statistical methods and further perspectives

    Full text link
    Variance partitioning methods, which are built upon multivariate statistics, have been widely applied in different taxa and habitats in community ecology. Here, I performed a literature review on the development and application of the methods, and then discussed the limitation of available methods and the difficulties involved in sampling schemes. The central goal of the work is then to propose some potential practical methods that might help to overcome different issues of traditional least-square-based regression modeling. A variety of regression models has been considered for comparison. In initial simulations, I identified that generalized additive model (GAM) has the highest accuracy to predict variation components. Therefore, I argued that other advanced regression techniques, including the GAM and related models, could be utilized in variation partitioning for better quantifying the aggregation scenarios of species distribution.Comment: 19 pages; 4 figure

    Differential Privacy Applications to Bayesian and Linear Mixed Model Estimation

    Get PDF
    We consider a particular maximum likelihood estimator (MLE) and a computationally-intensive Bayesian method for differentially private estimation of the linear mixed-effects model (LMM) with normal random errors. The LMM is important because it is used in small area estimation and detailed industry tabulations that present significant challenges for confidentiality protection of the underlying data. The differentially private MLE performs well compared to the regular MLE, and deteriorates as the protection increases for a problem in which the small-area variation is at the county level. More dimensions of random effects are needed to adequately represent the time- dimension of the data, and for these cases the differentially private MLE cannot be computed. The direct Bayesian approach for the same model uses an informative, but reasonably diffuse, prior to compute the posterior predictive distribution for the random effects. The differential privacy of this approach is estimated by direct computation of the relevant odds ratios after deleting influential observations according to various criteria
    • …
    corecore