307 research outputs found

    Structured Matrix Completion with Applications to Genomic Data Integration

    Get PDF
    Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.Comment: Accepted for publication in Journal of the American Statistical Associatio

    Nonconcave penalized composite conditional likelihood estimation of sparse Ising models

    Full text link
    The Ising model is a useful tool for studying complex interactions within a system. The estimation of such a model, however, is rather challenging, especially in the presence of high-dimensional parameters. In this work, we propose efficient procedures for learning a sparse Ising model based on a penalized composite conditional likelihood with nonconcave penalties. Nonconcave penalized likelihood estimation has received a lot of attention in recent years. However, such an approach is computationally prohibitive under high-dimensional Ising models. To overcome such difficulties, we extend the methodology and theory of nonconcave penalized likelihood to penalized composite conditional likelihood estimation. The proposed method can be efficiently implemented by taking advantage of coordinate-ascent and minorization--maximization principles. Asymptotic oracle properties of the proposed method are established with NP-dimensionality. Optimality of the computed local solution is discussed. We demonstrate its finite sample performance via simulation studies and further illustrate our proposal by studying the Human Immunodeficiency Virus type 1 protease structure based on data from the Stanford HIV drug resistance database. Our statistical learning results match the known biological findings very well, although no prior biological information is used in the data analysis procedure.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1017 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore