17,482 research outputs found
Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering
Comparing large covariance matrices has important applications in modern
genomics, where scientists are often interested in understanding whether
relationships (e.g., dependencies or co-regulations) among a large number of
genes vary between different biological states. We propose a computationally
fast procedure for testing the equality of two large covariance matrices when
the dimensions of the covariance matrices are much larger than the sample
sizes. A distinguishing feature of the new procedure is that it imposes no
structural assumptions on the unknown covariance matrices. Hence the test is
robust with respect to various complex dependence structures that frequently
arise in genomics. We prove that the proposed procedure is asymptotically valid
under weak moment conditions. As an interesting application, we derive a new
gene clustering algorithm which shares the same nice property of avoiding
restrictive structural assumptions for high-dimensional genomics data. Using an
asthma gene expression dataset, we illustrate how the new test helps compare
the covariance matrices of the genes across different gene sets/pathways
between the disease group and the control group, and how the gene clustering
algorithm provides new insights on the way gene clustering patterns differ
between the two groups. The proposed methods have been implemented in an
R-package HDtest and is available on CRAN.Comment: The original title dated back to May 2015 is "Bootstrap Tests on High
Dimensional Covariance Matrices with Applications to Understanding Gene
Clustering
MATS: Inference for potentially Singular and Heteroscedastic MANOVA
In many experiments in the life sciences, several endpoints are recorded per
subject. The analysis of such multivariate data is usually based on MANOVA
models assuming multivariate normality and covariance homogeneity. These
assumptions, however, are often not met in practice. Furthermore, test
statistics should be invariant under scale transformations of the data, since
the endpoints may be measured on different scales. In the context of
high-dimensional data, Srivastava and Kubokawa (2013) proposed such a test
statistic for a specific one-way model, which, however, relies on the
assumption of a common non-singular covariance matrix. We modify and extend
this test statistic to factorial MANOVA designs, incorporating general
heteroscedastic models. In particular, our only distributional assumption is
the existence of the group-wise covariance matrices, which may even be
singular. We base inference on quantiles of resampling distributions, and
derive confidence regions and ellipsoids based on these quantiles. In a
simulation study, we extensively analyze the behavior of these procedures.
Finally, the methods are applied to a data set containing information on the
2016 presidential elections in the USA with unequal and singular empirical
covariance matrices
User-Friendly Covariance Estimation for Heavy-Tailed Distributions
We offer a survey of recent results on covariance estimation for heavy-tailed
distributions. By unifying ideas scattered in the literature, we propose
user-friendly methods that facilitate practical implementation. Specifically,
we introduce element-wise and spectrum-wise truncation operators, as well as
their -estimator counterparts, to robustify the sample covariance matrix.
Different from the classical notion of robustness that is characterized by the
breakdown property, we focus on the tail robustness which is evidenced by the
connection between nonasymptotic deviation and confidence level. The key
observation is that the estimators needs to adapt to the sample size,
dimensionality of the data and the noise level to achieve optimal tradeoff
between bias and robustness. Furthermore, to facilitate their practical use, we
propose data-driven procedures that automatically calibrate the tuning
parameters. We demonstrate their applications to a series of structured models
in high dimensions, including the bandable and low-rank covariance matrices and
sparse precision matrices. Numerical studies lend strong support to the
proposed methods.Comment: 56 pages, 2 figure
Statistical eigen-inference from large Wishart matrices
We consider settings where the observations are drawn from a zero-mean
multivariate (real or complex) normal distribution with the population
covariance matrix having eigenvalues of arbitrary multiplicity. We assume that
the eigenvectors of the population covariance matrix are unknown and focus on
inferential procedures that are based on the sample eigenvalues alone (i.e.,
"eigen-inference"). Results found in the literature establish the asymptotic
normality of the fluctuation in the trace of powers of the sample covariance
matrix. We develop concrete algorithms for analytically computing the limiting
quantities and the covariance of the fluctuations. We exploit the asymptotic
normality of the trace of powers of the sample covariance matrix to develop
eigenvalue-based procedures for testing and estimation. Specifically, we
formulate a simple test of hypotheses for the population eigenvalues and a
technique for estimating the population eigenvalues in settings where the
cumulative distribution function of the (nonrandom) population eigenvalues has
a staircase structure. Monte Carlo simulations are used to demonstrate the
superiority of the proposed methodologies over classical techniques and the
robustness of the proposed techniques in high-dimensional, (relatively) small
sample size settings. The improved performance results from the fact that the
proposed inference procedures are "global" (in a sense that we describe) and
exploit "global" information thereby overcoming the inherent biases that
cripple classical inference procedures which are "local" and rely on "local"
information.Comment: Published in at http://dx.doi.org/10.1214/07-AOS583 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Relaxed 2-D Principal Component Analysis by Norm for Face Recognition
A relaxed two dimensional principal component analysis (R2DPCA) approach is
proposed for face recognition. Different to the 2DPCA, 2DPCA- and G2DPCA,
the R2DPCA utilizes the label information (if known) of training samples to
calculate a relaxation vector and presents a weight to each subset of training
data. A new relaxed scatter matrix is defined and the computed projection axes
are able to increase the accuracy of face recognition. The optimal -norms
are selected in a reasonable range. Numerical experiments on practical face
databased indicate that the R2DPCA has high generalization ability and can
achieve a higher recognition rate than state-of-the-art methods.Comment: 19 pages, 11 figure
Finite Sample Properties of Tests Based on Prewhitened Nonparametric Covariance Estimators
We analytically investigate size and power properties of a popular family of
procedures for testing linear restrictions on the coefficient vector in a
linear regression model with temporally dependent errors. The tests considered
are autocorrelation-corrected F-type tests based on prewhitened nonparametric
covariance estimators that possibly incorporate a data-dependent bandwidth
parameter, e.g., estimators as considered in Andrews and Monahan (1992), Newey
and West (1994), or Rho and Shao (2013). For design matrices that are generic
in a measure theoretic sense we prove that these tests either suffer from
extreme size distortions or from strong power deficiencies. Despite this
negative result we demonstrate that a simple adjustment procedure based on
artificial regressors can often resolve this problem.Comment: Some material adde
Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification
Objective. The main goal of this work is to develop a model for multi-sensor
signals such as MEG or EEG signals, that accounts for the inter-trial
variability, suitable for corresponding binary classification problems. An
important constraint is that the model be simple enough to handle small size
and unbalanced datasets, as often encountered in BCI type experiments.
Approach. The method involves linear mixed effects statistical model, wavelet
transform and spatial filtering, and aims at the characterization of localized
discriminant features in multi-sensor signals. After discrete wavelet transform
and spatial filtering, a projection onto the relevant wavelet and spatial
channels subspaces is used for dimension reduction. The projected signals are
then decomposed as the sum of a signal of interest (i.e. discriminant) and
background noise, using a very simple Gaussian linear mixed model. Main
results. Thanks to the simplicity of the model, the corresponding parameter
estimation problem is simplified. Robust estimates of class-covariance matrices
are obtained from small sample sizes and an effective Bayes plug-in classifier
is derived. The approach is applied to the detection of error potentials in
multichannel EEG data, in a very unbalanced situation (detection of rare
events). Classification results prove the relevance of the proposed approach in
such a context. Significance. The combination of linear mixed model, wavelet
transform and spatial filtering for EEG classification is, to the best of our
knowledge, an original approach, which is proven to be effective. This paper
improves on earlier results on similar problems, and the three main ingredients
all play an important role
- …