69,627 research outputs found

    Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

    Full text link
    Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring

    Sparse permutation invariant covariance estimation

    Full text link
    The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in high-dimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lasso-type penalty. We establish a rate of convergence in the Frobenius norm as both data dimension pp and sample size nn are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlation-based version of the method exhibits better rates in the operator norm. We also derive a fast iterative algorithm for computing the estimator, which relies on the popular Cholesky decomposition of the inverse but produces a permutation-invariant estimator. The method is compared to other estimators on simulated data and on a real data example of tumor tissue classification using gene expression data.Comment: Published in at http://dx.doi.org/10.1214/08-EJS176 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sparse cointegration

    Full text link
    Cointegration analysis is used to estimate the long-run equilibrium relations between several time series. The coefficients of these long-run equilibrium relations are the cointegrating vectors. In this paper, we provide a sparse estimator of the cointegrating vectors. The estimation technique is sparse in the sense that some elements of the cointegrating vectors will be estimated as zero. For this purpose, we combine a penalized estimation procedure for vector autoregressive models with sparse reduced rank regression. The sparse cointegration procedure achieves a higher estimation accuracy than the traditional Johansen cointegration approach in settings where the true cointegrating vectors have a sparse structure, and/or when the sample size is low compared to the number of time series. We also discuss a criterion to determine the cointegration rank and we illustrate its good performance in several simulation settings. In a first empirical application we investigate whether the expectations hypothesis of the term structure of interest rates, implying sparse cointegrating vectors, holds in practice. In a second empirical application we show that forecast performance in high-dimensional systems can be improved by sparsely estimating the cointegration relations

    Analysis of Basis Pursuit Via Capacity Sets

    Full text link
    Finding the sparsest solution α\alpha for an under-determined linear system of equations Dα=sD\alpha=s is of interest in many applications. This problem is known to be NP-hard. Recent work studied conditions on the support size of α\alpha that allow its recovery using L1-minimization, via the Basis Pursuit algorithm. These conditions are often relying on a scalar property of DD called the mutual-coherence. In this work we introduce an alternative set of features of an arbitrarily given DD, called the "capacity sets". We show how those could be used to analyze the performance of the basis pursuit, leading to improved bounds and predictions of performance. Both theoretical and numerical methods are presented, all using the capacity values, and shown to lead to improved assessments of the basis pursuit success in finding the sparest solution of Dα=sD\alpha=s
    corecore