69,627 research outputs found
Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods
Feature extraction and dimensionality reduction are important tasks in many
fields of science dealing with signal processing and analysis. The relevance of
these techniques is increasing as current sensory devices are developed with
ever higher resolution, and problems involving multimodal data sources become
more common. A plethora of feature extraction methods are available in the
literature collectively grouped under the field of Multivariate Analysis (MVA).
This paper provides a uniform treatment of several methods: Principal Component
Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis
(CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions
derived by means of the theory of reproducing kernel Hilbert spaces. We also
review their connections to other methods for classification and statistical
dependence estimation, and introduce some recent developments to deal with the
extreme cases of large-scale and low-sized problems. To illustrate the wide
applicability of these methods in both classification and regression problems,
we analyze their performance in a benchmark of publicly available data sets,
and pay special attention to specific real applications involving audio
processing for music genre prediction and hyperspectral satellite images for
Earth and climate monitoring
Sparse permutation invariant covariance estimation
The paper proposes a method for constructing a sparse estimator for the
inverse covariance (concentration) matrix in high-dimensional settings. The
estimator uses a penalized normal likelihood approach and forces sparsity by
using a lasso-type penalty. We establish a rate of convergence in the Frobenius
norm as both data dimension and sample size are allowed to grow, and
show that the rate depends explicitly on how sparse the true concentration
matrix is. We also show that a correlation-based version of the method exhibits
better rates in the operator norm. We also derive a fast iterative algorithm
for computing the estimator, which relies on the popular Cholesky decomposition
of the inverse but produces a permutation-invariant estimator. The method is
compared to other estimators on simulated data and on a real data example of
tumor tissue classification using gene expression data.Comment: Published in at http://dx.doi.org/10.1214/08-EJS176 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Sparse cointegration
Cointegration analysis is used to estimate the long-run equilibrium relations
between several time series. The coefficients of these long-run equilibrium
relations are the cointegrating vectors. In this paper, we provide a sparse
estimator of the cointegrating vectors. The estimation technique is sparse in
the sense that some elements of the cointegrating vectors will be estimated as
zero. For this purpose, we combine a penalized estimation procedure for vector
autoregressive models with sparse reduced rank regression. The sparse
cointegration procedure achieves a higher estimation accuracy than the
traditional Johansen cointegration approach in settings where the true
cointegrating vectors have a sparse structure, and/or when the sample size is
low compared to the number of time series. We also discuss a criterion to
determine the cointegration rank and we illustrate its good performance in
several simulation settings. In a first empirical application we investigate
whether the expectations hypothesis of the term structure of interest rates,
implying sparse cointegrating vectors, holds in practice. In a second empirical
application we show that forecast performance in high-dimensional systems can
be improved by sparsely estimating the cointegration relations
Analysis of Basis Pursuit Via Capacity Sets
Finding the sparsest solution for an under-determined linear system
of equations is of interest in many applications. This problem is
known to be NP-hard. Recent work studied conditions on the support size of
that allow its recovery using L1-minimization, via the Basis Pursuit
algorithm. These conditions are often relying on a scalar property of
called the mutual-coherence. In this work we introduce an alternative set of
features of an arbitrarily given , called the "capacity sets". We show how
those could be used to analyze the performance of the basis pursuit, leading to
improved bounds and predictions of performance. Both theoretical and numerical
methods are presented, all using the capacity values, and shown to lead to
improved assessments of the basis pursuit success in finding the sparest
solution of
- …