521 research outputs found
Structured Matrix Completion with Applications to Genomic Data Integration
Matrix completion has attracted significant recent attention in many fields
including statistics, applied mathematics and electrical engineering. Current
literature on matrix completion focuses primarily on independent sampling
models under which the individual observed entries are sampled independently.
Motivated by applications in genomic data integration, we propose a new
framework of structured matrix completion (SMC) to treat structured missingness
by design. Specifically, our proposed method aims at efficient matrix recovery
when a subset of the rows and columns of an approximately low-rank matrix are
observed. We provide theoretical justification for the proposed SMC method and
derive lower bound for the estimation errors, which together establish the
optimal rate of recovery over certain classes of approximately low-rank
matrices. Simulation studies show that the method performs well in finite
sample under a variety of configurations. The method is applied to integrate
several ovarian cancer genomic studies with different extent of genomic
measurements, which enables us to construct more accurate prediction rules for
ovarian cancer survival.Comment: Accepted for publication in Journal of the American Statistical
Associatio
Nonconcave penalized composite conditional likelihood estimation of sparse Ising models
The Ising model is a useful tool for studying complex interactions within a
system. The estimation of such a model, however, is rather challenging,
especially in the presence of high-dimensional parameters. In this work, we
propose efficient procedures for learning a sparse Ising model based on a
penalized composite conditional likelihood with nonconcave penalties.
Nonconcave penalized likelihood estimation has received a lot of attention in
recent years. However, such an approach is computationally prohibitive under
high-dimensional Ising models. To overcome such difficulties, we extend the
methodology and theory of nonconcave penalized likelihood to penalized
composite conditional likelihood estimation. The proposed method can be
efficiently implemented by taking advantage of coordinate-ascent and
minorization--maximization principles. Asymptotic oracle properties of the
proposed method are established with NP-dimensionality. Optimality of the
computed local solution is discussed. We demonstrate its finite sample
performance via simulation studies and further illustrate our proposal by
studying the Human Immunodeficiency Virus type 1 protease structure based on
data from the Stanford HIV drug resistance database. Our statistical learning
results match the known biological findings very well, although no prior
biological information is used in the data analysis procedure.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1017 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Model Checking for ROC Regression Analysis
The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual process and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the Cystic Fibrosis registry
- …