31,297 research outputs found

    Structured penalties for functional linear models---partially empirical eigenvectors for regression

    Get PDF
    One of the challenges with functional data is incorporating spatial structure, or local correlation, into the analysis. This structure is inherent in the output from an increasing number of biomedical technologies, and a functional linear model is often used to estimate the relationship between the predictor functions and scalar responses. Common approaches to the ill-posed problem of estimating a coefficient function typically involve two stages: regularization and estimation. Regularization is usually done via dimension reduction, projecting onto a predefined span of basis functions or a reduced set of eigenvectors (principal components). In contrast, we present a unified approach that directly incorporates spatial structure into the estimation process by exploiting the joint eigenproperties of the predictors and a linear penalty operator. In this sense, the components in the regression are `partially empirical' and the framework is provided by the generalized singular value decomposition (GSVD). The GSVD clarifies the penalized estimation process and informs the choice of penalty by making explicit the joint influence of the penalty and predictors on the bias, variance, and performance of the estimated coefficient function. Laboratory spectroscopy data and simulations are used to illustrate the concepts.Comment: 29 pages, 3 figures, 5 tables; typo/notational errors edited and intro revised per journal review proces

    Validation of nonlinear PCA

    Full text link
    Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.Comment: 12 pages, 5 figure

    Covariance Estimation: The GLM and Regularization Perspectives

    Get PDF
    Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Neural Connectivity with Hidden Gaussian Graphical State-Model

    Full text link
    The noninvasive procedures for neural connectivity are under questioning. Theoretical models sustain that the electromagnetic field registered at external sensors is elicited by currents at neural space. Nevertheless, what we observe at the sensor space is a superposition of projected fields, from the whole gray-matter. This is the reason for a major pitfall of noninvasive Electrophysiology methods: distorted reconstruction of neural activity and its connectivity or leakage. It has been proven that current methods produce incorrect connectomes. Somewhat related to the incorrect connectivity modelling, they disregard either Systems Theory and Bayesian Information Theory. We introduce a new formalism that attains for it, Hidden Gaussian Graphical State-Model (HIGGS). A neural Gaussian Graphical Model (GGM) hidden by the observation equation of Magneto-encephalographic (MEEG) signals. HIGGS is equivalent to a frequency domain Linear State Space Model (LSSM) but with sparse connectivity prior. The mathematical contribution here is the theory for high-dimensional and frequency-domain HIGGS solvers. We demonstrate that HIGGS can attenuate the leakage effect in the most critical case: the distortion EEG signal due to head volume conduction heterogeneities. Its application in EEG is illustrated with retrieved connectivity patterns from human Steady State Visual Evoked Potentials (SSVEP). We provide for the first time confirmatory evidence for noninvasive procedures of neural connectivity: concurrent EEG and Electrocorticography (ECoG) recordings on monkey. Open source packages are freely available online, to reproduce the results presented in this paper and to analyze external MEEG databases

    Iteratively regularized Newton-type methods for general data misfit functionals and applications to Poisson data

    Get PDF
    We study Newton type methods for inverse problems described by nonlinear operator equations F(u)=gF(u)=g in Banach spaces where the Newton equations F(un;un+1un)=gF(un)F'(u_n;u_{n+1}-u_n) = g-F(u_n) are regularized variationally using a general data misfit functional and a convex regularization term. This generalizes the well-known iteratively regularized Gauss-Newton method (IRGNM). We prove convergence and convergence rates as the noise level tends to 0 both for an a priori stopping rule and for a Lepski{\u\i}-type a posteriori stopping rule. Our analysis includes previous order optimal convergence rate results for the IRGNM as special cases. The main focus of this paper is on inverse problems with Poisson data where the natural data misfit functional is given by the Kullback-Leibler divergence. Two examples of such problems are discussed in detail: an inverse obstacle scattering problem with amplitude data of the far-field pattern and a phase retrieval problem. The performence of the proposed method for these problems is illustrated in numerical examples

    Inferring Multiple Graphical Structures

    Full text link
    Gaussian Graphical Models provide a convenient framework for representing dependencies between variables. Recently, this tool has received a high interest for the discovery of biological networks. The literature focuses on the case where a single network is inferred from a set of measurements, but, as wetlab data is typically scarce, several assays, where the experimental conditions affect interactions, are usually merged to infer a single network. In this paper, we propose two approaches for estimating multiple related graphs, by rendering the closeness assumption into an empirical prior or group penalties. We provide quantitative results demonstrating the benefits of the proposed approaches. The methods presented in this paper are embeded in the R package 'simone' from version 1.0-0 and later
    corecore