Search CORE

26,611 research outputs found

Optimal classification in sparse Gaussian graphic model

Author: Fan Yingying
Jin Jiashun
Yao Zhigang
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/11/2013
Field of study

Consider a two-class classification problem where the number of features is much larger than the sample size. The features are masked by Gaussian noise with mean zero and covariance matrix

\Sigma

, where the precision matrix

\Omega=\Sigma^{-1}

is unknown but is presumably sparse. The useful features, also unknown, are sparse and each contributes weakly (i.e., rare and weak) to the classification decision. By obtaining a reasonably good estimate of

\Omega

, we formulate the setting as a linear regression model. We propose a two-stage classification method where we first select features by the method of Innovated Thresholding (IT), and then use the retained features and Fisher's LDA for classification. In this approach, a crucial problem is how to set the threshold of IT. We approach this problem by adapting the recent innovation of Higher Criticism Thresholding (HCT). We find that when useful features are rare and weak, the limiting behavior of HCT is essentially just as good as the limiting behavior of ideal threshold, the threshold one would choose if the underlying distribution of the signals is known (if only). Somewhat surprisingly, when

\Omega

is sufficiently sparse, its off-diagonal coordinates usually do not have a major influence over the classification decision. Compared to recent work in the case where

\Omega

is the identity matrix [Proc. Natl. Acad. Sci. USA 105 (2008) 14790-14795; Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009) 4449-4470], the current setting is much more general, which needs a new approach and much more sophisticated analysis. One key component of the analysis is the intimate relationship between HCT and Fisher's separation. Another key component is the tight large-deviation bounds for empirical processes for data with unconventional correlation structures, where graph theory on vertex coloring plays an important role.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1163 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Covariate assisted screening and estimation

Author: Fan Jianqing
Jin Jiashun
Ke Zheng Tracy
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 19/11/2014
Field of study

Consider a linear model

Y=X\beta+z

, where

X=X_{n,p}

and

z\sim N(0,I_n)

. The vector

\beta

is unknown but is sparse in the sense that most of its coordinates are

0

. The main interest is to separate its nonzero coordinates from the zero ones (i.e., variable selection). Motivated by examples in long-memory time series (Fan and Yao [Nonlinear Time Series: Nonparametric and Parametric Methods (2003) Springer]) and the change-point problem (Bhattacharya [In Change-Point Problems (South Hadley, MA, 1992) (1994) 28-56 IMS]), we are primarily interested in the case where the Gram matrix

G=X'X

is nonsparse but sparsifiable by a finite order linear filter. We focus on the regime where signals are both rare and weak so that successful variable selection is very challenging but is still possible. We approach this problem by a new procedure called the covariate assisted screening and estimation (CASE). CASE first uses a linear filtering to reduce the original setting to a new regression model where the corresponding Gram (covariance) matrix is sparse. The new covariance matrix induces a sparse graph, which guides us to conduct multivariate screening without visiting all the submodels. By interacting with the signal sparsity, the graph enables us to decompose the original problem into many separated small-size subproblems (if only we know where they are!). Linear filtering also induces a so-called problem of information leakage, which can be overcome by the newly introduced patching technique. Together, these give rise to CASE, which is a two-stage screen and clean [Fan and Song Ann. Statist. 38 (2010) 3567-3604; Wasserman and Roeder Ann. Statist. 37 (2009) 2178-2201] procedure, where we first identify candidates of these submodels by patching and screening, and then re-examine each candidate to remove false positives.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1243 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref