37 research outputs found

    Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

    Full text link
    Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring

    Sparse kernel orthonormalized PLS for feature extraction in large datasets

    Get PDF
    In this paper we are presenting a novel multivariate analysis method. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features for genre prediction. The upshot is that the method has strong expressive power even with rather few features, is clearly outperforming the ordinary kernel PLS, and therefore is an appealing method for feature extraction of labelled data

    Sparse and kernel OPLS feature extraction based on eigenvalue problem solving

    Get PDF
    Orthonormalized partial least squares (OPLS) is a popular multivariate analysis method to perform supervised feature extraction. Usually, in machine learning papers OPLS projections are obtained by solving a generalized eigenvalue problem. However, in statistical papers the method is typically formulated in terms of a reduced-rank regression problem, leading to a formulation based on a standard eigenvalue decomposition. A first contribution of this paper is to derive explicit expressions for matching the OPLS solutions derived under both approaches and discuss that the standard eigenvalue formulation is also normally more convenient for feature extraction in machine learning. More importantly, since optimization with respect to the projection vectors is carried out without constraints via a minimization problem, inclusion of penalty terms that favor sparsity is straightforward. In the paper, we exploit this fact to propose modified versions of OPLS. In particular, relying on the ℓ1 norm, we propose a sparse version of linear OPLS, as well as a non-linear kernel OPLS with pattern selection. We also incorporate a group-lasso penalty to derive an OPLS method with true feature selection. The discriminative power of the proposed methods is analyzed on a benchmark of classification problems. Furthermore, we compare the degree of sparsity achieved by our methods and compare them with other state-of-the-art methods for sparse feature extraction.This work was partly supported by MINECO projects TEC2011-22480 and PRIPIBIN-2011-1266.Publicad

    Multi-Label Dimensionality Reduction

    Get PDF
    abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.Dissertation/ThesisPh.D. Computer Science 201

    Multivariate KPI for energy management of cooling system in food industry

    Get PDF
    Within EU, the food industry is currently ranked among the energy-intensive sectors, mainly as a consequence of the cooling system shareover the total energy demand. As such, the definition of appropriate key performance indicators (KPI) for ammonia chillers can play a strategic role for the efficient monitoring of the energy performance of the cooling systems. The goal of this paper is to develop an appropriate management approach, to account for energy inefficiency of the single compressors, and to identify the specific variables driving the performance outliers. To this end, a new KPI is proposed which correlates the energy consumption and the different process variables. The construction of the new indicator was carried out by means of multivariate statistical analysis, in particular using Kernel Partial Least Square (KPLS).This method is able to evaluate the maximum correlation between dataset and energy consumption employing nonlinear regression techniques. The validity of the new KPI is discussed on a case study relevant to the cooling system of a frozen ready meals industry. The assessment of the proposed metric is one against Specific Energy Consumption (SEC) like indicator, typically used in the context of the Energy Management Systems

    Towards Effective Codebookless Model for Image Classification

    Full text link
    The bag-of-features (BoF) model for image classification has been thoroughly studied over the last decade. Different from the widely used BoF methods which modeled images with a pre-trained codebook, the alternative codebook free image modeling method, which we call Codebookless Model (CLM), attracted little attention. In this paper, we present an effective CLM that represents an image with a single Gaussian for classification. By embedding Gaussian manifold into a vector space, we show that the simple incorporation of our CLM into a linear classifier achieves very competitive accuracy compared with state-of-the-art BoF methods (e.g., Fisher Vector). Since our CLM lies in a high dimensional Riemannian manifold, we further propose a joint learning method of low-rank transformation with support vector machine (SVM) classifier on the Gaussian manifold, in order to reduce computational and storage cost. To study and alleviate the side effect of background clutter on our CLM, we also present a simple yet effective partial background removal method based on saliency detection. Experiments are extensively conducted on eight widely used databases to demonstrate the effectiveness and efficiency of our CLM method

    Regularized multivariate analysis framework for interpretable high-dimensional variable selection

    Get PDF
    Multivariate Analysis (MVA) comprises a family of well-known methods for feature extraction which exploit correlations among input variables representing the data. One important property that is enjoyed by most such methods is uncorrelation among the extracted features. Recently, regularized versions of MVA methods have appeared in the literature, mainly with the goal to gain interpretability of the solution. In these cases, the solutions can no longer be obtained in a closed manner, and more complex optimization methods that rely on the iteration of two steps are frequently used. This paper recurs to an alternative approach to solve efficiently this iterative problem. The main novelty of this approach lies in preserving several properties of the original methods, most notably the uncorrelation of the extracted features. Under this framework, we propose a novel method that takes advantage of the,2,1 norm to perform variable selection during the feature extraction process. Experimental results over different problems corroborate the advantages of the proposed formulation in comparison to state of the art formulations.This work has been partly supported by MINECO projects TEC2013-48439-C4-1-R, TEC2014-52289-R and TEC2016-75161-C2-2-R, and Comunidad de Madrid projects PRICAM P2013/ICE-2933 and S2013/ICE-2933

    Nonnegative OPLS for supervised design of filter banks: application to image and audio feature extraction

    Get PDF
    Audio or visual data analysis tasks usually have to deal with high-dimensional and nonnegative signals. However, most data analysis methods suffer from overfitting and numerical problems when data have more than a few dimensions needing a dimensionality reduction preprocessing. Moreover, interpretability about how and why filters work for audio or visual applications is a desired property, especially when energy or spectral signals are involved. In these cases, due to the nature of these signals, the nonnegativity of the filter weights is a desired property to better understand its working. Because of these two necessities, we propose different methods to reduce the dimensionality of data while the nonnegativity and interpretability of the solution are assured. In particular, we propose a generalized methodology to design filter banks in a supervised way for applications dealing with nonnegative data, and we explore different ways of solving the proposed objective function consisting of a nonnegative version of the orthonormalized partial least-squares method. We analyze the discriminative power of the features obtained with the proposed methods for two different and widely studied applications: texture and music genre classification. Furthermore, we compare the filter banks achieved by our methods with other state-of-the-art methods specifically designed for feature extraction.This work was supported in parts by the MINECO projects TEC2013-48439-C4-1-R, TEC2014-52289-R, TEC2016-75161-C2-1-R, TEC2016-75161-C2-2-R, TEC2016-81900-REDT/AEI, and PRICAM (S2013/ICE-2933)
    corecore