1,295 research outputs found

    Time series estimation in a spiked signal regime

    Get PDF
    Working with missing or incomplete data is a universal problem in all sciences. In meteorology, temperature data streams can contain missing values due to sensor malfunctions. In geophysical remote sensing, missing data can may be attributed to irregular global sampling by an orbiting spacecraft. In a collaborative filtering application, like the Netflix Challenge, data is incomplete since it is not possible for all users to provide a recommendation on all items. Though we do not have access to complete data, it is still quite possible to forecast weather, and to recommend good movies on Netflix. The development of estimation algorithms that properly handle missing data make data imputation and forecasting possible. The design of any estimation algorithm depends on the assumptions one can make on a given set of data. This thesis addresses the problem of estimating a noisy, incomplete time series of a dynamical system with unknown state evolution. The technique presented is TSCC (Transformed Spiked Co- variance Completion), a matrix completion algorithm for signal estimation that leverages the spiked signal model, an assumption that holds true for many high-dimensional datasets. The TSCC technique exploits this assumption to develop an estimator that is resilient to noise and accurately fills in missing data. This thesis first addresses the specific estimation problem and the signal model that it follows. It then presents a survey of both standard and the state-of-the-art techniques in addition to an analysis of TSCC. These methods are used to solve the problem of estimating the state of dynamical system, with partial, noisy observations. Standard textbook techniques are not reliable in state estimation due to their inability to handle missing data and to generalize dynamical models. TSCC is an algorithm which addresses this estimation problem and accounts for the deficiencies. Concluding this thesis, several numerical experiments on both synthetic and real data demonstrate that TSCC outperforms these other techniques by forming a time-lagged embedding and estimating the dynamical modes of the system. TSCC has an advantage over other techniques as it does not require knowledge of the state dynamics and that it leverages the asymptotic behavior of noisy, low-rank matrices to perform imputation and denoising. The TSCC technique assumes that a system can be represented by several dynamical modes which is analgous to a matrix having a low rank. Overall, TSCC is a state estimation algorithm that performs estimation on noisy and incomplete data without prior model assumptions. Numerical experiments show that TSCC is an enhancement of the current, accepted techniques which address the same estimation problem

    Structural Variability from Noisy Tomographic Projections

    Full text link
    In cryo-electron microscopy, the 3D electric potentials of an ensemble of molecules are projected along arbitrary viewing directions to yield noisy 2D images. The volume maps representing these potentials typically exhibit a great deal of structural variability, which is described by their 3D covariance matrix. Typically, this covariance matrix is approximately low-rank and can be used to cluster the volumes or estimate the intrinsic geometry of the conformation space. We formulate the estimation of this covariance matrix as a linear inverse problem, yielding a consistent least-squares estimator. For nn images of size NN-by-NN pixels, we propose an algorithm for calculating this covariance estimator with computational complexity O(nN4+κN6logN)\mathcal{O}(nN^4+\sqrt{\kappa}N^6 \log N), where the condition number κ\kappa is empirically in the range 1010--200200. Its efficiency relies on the observation that the normal equations are equivalent to a deconvolution problem in 6D. This is then solved by the conjugate gradient method with an appropriate circulant preconditioner. The result is the first computationally efficient algorithm for consistent estimation of 3D covariance from noisy projections. It also compares favorably in runtime with respect to previously proposed non-consistent estimators. Motivated by the recent success of eigenvalue shrinkage procedures for high-dimensional covariance matrices, we introduce a shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We evaluate our methods on simulated datasets and achieve classification results comparable to state-of-the-art methods in shorter running time. We also present results on clustering volumes in an experimental dataset, illustrating the power of the proposed algorithm for practical determination of structural variability.Comment: 52 pages, 11 figure

    The phenome-wide distribution of genetic variance

    Get PDF
    A general observation emerging from estimates of additive genetic variance in sets of functionally or developmentally related traits is that much of the genetic variance is restricted to few trait combinations as a consequence of genetic covariance among traits. While this biased distribution of genetic variance among functionally related traits is now well documented, how it translates to the broader phenome and therefore any trait combination under selection in a given environment is unknown. We show that 8,750 gene expression traits measured in adult male Drosophila serrata exhibit widespread genetic covariance among random sets of five traits, implying that pleiotropy is common. Ultimately, to understand the phenome-wide distribution of genetic variance, very large additive genetic variance-covariance matrices (G) are required to be estimated. We draw upon recent advances in matrix theory for completing high-dimensional matrices to estimate the 8,750-trait G and show that large numbers of gene expression traits genetically covary as a consequence of a single genetic factor. Using gene ontology term enrichment analysis, we show that the major axis of genetic variance among expression traits successfully identified genetic covariance among genes involved in multiple modes of transcriptional regulation. Our approach provides a practical empirical framework for the genetic analysis of high-dimensional phenome-wide trait sets and for the investigation of the extent of high-dimensional genetic constraint

    Nonparanormal Graph Quilting with Applications to Calcium Imaging

    Full text link
    Probabilistic graphical models have become an important unsupervised learning tool for detecting network structures for a variety of problems, including the estimation of functional neuronal connectivity from two-photon calcium imaging data. However, in the context of calcium imaging, technological limitations only allow for partially overlapping layers of neurons in a brain region of interest to be jointly recorded. In this case, graph estimation for the full data requires inference for edge selection when many pairs of neurons have no simultaneous observations. This leads to the Graph Quilting problem, which seeks to estimate a graph in the presence of block-missingness in the empirical covariance matrix. Solutions for the Graph Quilting problem have previously been studied for Gaussian graphical models; however, neural activity data from calcium imaging are often non-Gaussian, thereby requiring a more flexible modeling approach. Thus, in our work, we study two approaches for nonparanormal Graph Quilting based on the Gaussian copula graphical model, namely a maximum likelihood procedure and a low-rank based framework. We provide theoretical guarantees on edge recovery for the former approach under similar conditions to those previously developed for the Gaussian setting, and we investigate the empirical performance of both methods using simulations as well as real data calcium imaging data. Our approaches yield more scientifically meaningful functional connectivity estimates compared to existing Gaussian graph quilting methods for this calcium imaging data set
    corecore