Generalized Matrix Decomposition Regression: Estimation and Inference for Two-way Structured Data

Abstract

Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. In this article, we first propose the GMD regression (GMDR) as an estimation/prediction tool that seamlessly incorporates two-way structures into high-dimensional linear models. The proposed GMDR directly regresses the outcome on a set of GMD components, selected by a novel procedure that guarantees the best prediction performance. We then propose the GMD inference (GMDI) framework to identify variables that are associated with the outcome for any model in a large family of regression models that includes GMDR. As opposed to most existing tools for high-dimensional inference, GMDI efficiently accounts for pre-specified two-way structures and can provide asymptotically valid inference even for non-sparse coefficient vectors. We study the theoretical properties of GMDI in terms of both the type-I error rate and power. We demonstrate the effectiveness of GMDR and GMDI on simulated data and an application to microbiome data

Similar works

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.