Matrix completion has attracted significant recent attention in many fields
including statistics, applied mathematics and electrical engineering. Current
literature on matrix completion focuses primarily on independent sampling
models under which the individual observed entries are sampled independently.
Motivated by applications in genomic data integration, we propose a new
framework of structured matrix completion (SMC) to treat structured missingness
by design. Specifically, our proposed method aims at efficient matrix recovery
when a subset of the rows and columns of an approximately low-rank matrix are
observed. We provide theoretical justification for the proposed SMC method and
derive lower bound for the estimation errors, which together establish the
optimal rate of recovery over certain classes of approximately low-rank
matrices. Simulation studies show that the method performs well in finite
sample under a variety of configurations. The method is applied to integrate
several ovarian cancer genomic studies with different extent of genomic
measurements, which enables us to construct more accurate prediction rules for
ovarian cancer survival.Comment: Accepted for publication in Journal of the American Statistical
Associatio