307 research outputs found
Structured Matrix Completion with Applications to Genomic Data Integration
Matrix completion has attracted significant recent attention in many fields
including statistics, applied mathematics and electrical engineering. Current
literature on matrix completion focuses primarily on independent sampling
models under which the individual observed entries are sampled independently.
Motivated by applications in genomic data integration, we propose a new
framework of structured matrix completion (SMC) to treat structured missingness
by design. Specifically, our proposed method aims at efficient matrix recovery
when a subset of the rows and columns of an approximately low-rank matrix are
observed. We provide theoretical justification for the proposed SMC method and
derive lower bound for the estimation errors, which together establish the
optimal rate of recovery over certain classes of approximately low-rank
matrices. Simulation studies show that the method performs well in finite
sample under a variety of configurations. The method is applied to integrate
several ovarian cancer genomic studies with different extent of genomic
measurements, which enables us to construct more accurate prediction rules for
ovarian cancer survival.Comment: Accepted for publication in Journal of the American Statistical
Associatio
Nonconcave penalized composite conditional likelihood estimation of sparse Ising models
The Ising model is a useful tool for studying complex interactions within a
system. The estimation of such a model, however, is rather challenging,
especially in the presence of high-dimensional parameters. In this work, we
propose efficient procedures for learning a sparse Ising model based on a
penalized composite conditional likelihood with nonconcave penalties.
Nonconcave penalized likelihood estimation has received a lot of attention in
recent years. However, such an approach is computationally prohibitive under
high-dimensional Ising models. To overcome such difficulties, we extend the
methodology and theory of nonconcave penalized likelihood to penalized
composite conditional likelihood estimation. The proposed method can be
efficiently implemented by taking advantage of coordinate-ascent and
minorization--maximization principles. Asymptotic oracle properties of the
proposed method are established with NP-dimensionality. Optimality of the
computed local solution is discussed. We demonstrate its finite sample
performance via simulation studies and further illustrate our proposal by
studying the Human Immunodeficiency Virus type 1 protease structure based on
data from the Stanford HIV drug resistance database. Our statistical learning
results match the known biological findings very well, although no prior
biological information is used in the data analysis procedure.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1017 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …