12 research outputs found

    Unsupervised Feature Extraction Using Singular Value Decomposition

    Get PDF
    AbstractThough modern data often provides a massive amount of information, much of the insight might be redundant or useless (noise). Thus, it is significant to recognize the most informative features of data. This will help the analysis of the data by removing the consequences of high dimensionality, in addition of obtaining other advantages of lower dimensional data such as lower computational cost and a less complex model. Modern data has high dimension, sparsity and correlation besides its characteristics of being unstructured, distorted, corrupt, deformed, and massive. Feature extraction has always been a major toll in machine learning applications. Due to these extraordinary features of modern data, feature extraction and feature reduction models and techniques have even more significance in analyzing and understanding the data

    Varimax rotation based on gradient projection needs between 10 and more than 500 random start loading matrices for optimal performance

    Get PDF
    Gradient projection rotation (GPR) is a promising method to rotate factor or component loadings by different criteria. Since the conditions for optimal performance of GPR-Varimax are widely unknown, this simulation study investigates GPR towards the Varimax criterion in principal component analysis. The conditions of the simulation study comprise two sample sizes (n = 100, n = 300), with orthogonal simple structure population models based on four numbers of components (3, 6, 9, 12), with- and without Kaiser-normalization, and six numbers of random start loading matrices for GPR-Varimax rotation (1, 10, 50, 100, 500, 1,000). GPR-Varimax rotation always performed better when at least 10 random matrices were used for start loadings instead of the identity matrix. GPR-Varimax worked better for a small number of components, larger (n = 300) as compared to smaller (n = 100) samples, and when loadings were Kaiser-normalized before rotation. To ensure optimal (stationary) performance of GPR-Varimax in recovering orthogonal simple structure, we recommend using at least 10 iterations of start loading matrices for the rotation of up to three components and 50 iterations for up to six components. For up to nine components, rotation should be based on a sample size of at least 300 cases, Kaiser-normalization, and more than 50 different start loading matrices. For more than nine components, GPR-Varimax rotation should be based on at least 300 cases, Kaiser-normalization, and at least 500 different start loading matrices.Comment: 19 pages, 8 figures, 2 tables, 4 figures in the Supplemen

    Robust sparse principal component analysis.

    Get PDF
    A method for principal component analysis is proposed that is sparse and robust at the same time. The sparsity delivers principal components that have loadings on a small number of variables, making them easier to interpret. The robustness makes the analysis resistant to outlying observations. The principal components correspond to directions that maximize a robust measure of the variance, with an additional penalty term to take sparseness into account. We propose an algorithm to compute the sparse and robust principal components. The method is applied on several real data examples, and diagnostic plots for detecting outliers and for selecting the degree of sparsity are provided. A simulation experiment studies the loss in statistical efficiency by requiring both robustness and sparsity.Dispersion measure; Projection-pursuit; Outliers; Variable selection;

    A sparse PLS for variable selection when integrating omics data

    Get PDF
    Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called "sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition. Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets. Copyright ©2008 The Berkeley Electronic Press. All rights reserved

    Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

    Full text link
    The sparsity constrained rank-one matrix approximation problem is a difficult mathematical optimization problem which arises in a wide array of useful applications in engineering, machine learning and statistics, and the design of algorithms for this problem has attracted intensive research activities. We introduce an algorithmic framework, called ConGradU, that unifies a variety of seemingly different algorithms that have been derived from disparate approaches, and allows for deriving new schemes. Building on the old and well-known conditional gradient algorithm, ConGradU is a simplified version with unit step size and yields a generic algorithm which either is given by an analytic formula or requires a very low computational complexity. Mathematical properties are systematically developed and numerical experiments are given.Comment: Minor changes. Final version. To appear in SIAM Revie

    Análisis de componentes principales Sparse : formulación, algoritmos e implicaciones en el análisis de datos

    Get PDF
    Trabajo de Fin de Máster en Análisis Avanzado de Datos Multivariantes. Curso 2014-2015[ES]El Análisis de Componentes principales es una de las técnicas más implementada en las etapas de pre-procesamiento o de reducción de la dimensión de matrices de datos. Su principal función es proyectar los datos de entrada en nuevas direcciones, conocidas como componentes principales (PCs), que absorban la mayor cantidad de información posible y así, poder eliminar aquellas variables que aporten menos variabilidad. Sin embargo, la interpretación de las PCs es complicada, pues resultan de la combinación lineal de todas las variables originales. Es por ello que surgen distintas formas de enfrentar esta problemática; como los conocidos métodos de rotación. En este trabajo, se presenta el Análisis de Componentes Principales Sparse (SPCA) como otra forma de solventar esta dificultad. Es un método de selección de variables características, intentando que gran parte de las cargas que definen las PCs sean nulas (cargas sparse). A partir de la búsqueda de bibliografía relevante, se redactará el estado del arte del SPCA, integrando los enfoques de maximización de la varianza y minimización del error en el SPCA. Profundamente, se enfocará la técnica a partir de la reformulación del PCA como problema de minimización del error, aprovechando los desarrollos de los modelos de regresión lineal e integrando restricciones típicas de estos, como la penalización Elastic net, para mejorar el análisis de datos. Se comienza entonces con la formulación del SPCA, los algoritmos e implicaciones en el análisis de datos, comparando diferencias entre las componentes principales clásicas, las soluciones rotadas y las soluciones sparse

    Model Based Principal Component Analysis with Application to Functional Magnetic Resonance Imaging.

    Full text link
    Functional Magnetic Resonance Imaging (fMRI) has allowed better understanding of human brain organization and function by making it possible to record either autonomous or stimulus induced brain activity. After appropriate preprocessing fMRI produces a large spatio-temporal data set, which requires sophisticated signal processing. The aim of the signal processing is usually to produce spatial maps of statistics that capture the effects of interest, e.g., brain activation, time delay between stimulation and activation, or connectivity between brain regions. Two broad signal processing approaches have been pursued; univoxel methods and multivoxel methods. This proposal will focus on multivoxel methods and review Principal Component Analysis (PCA), and other closely related methods, and describe their advantages and disadvantages in fMRI research. These existing multivoxel methods have in common that they are exploratory, i.e., they are not based on a statistical model. A crucial observation which is central to this thesis, is that there is in fact an underlying model behind PCA, which we call noisy PCA (nPCA). In the main part of this thesis, we use nPCA to develop methods that solve three important problems in fMRI. 1) We introduce a novel nPCA based spatio-temporal model that combines the standard univoxel regression model with nPCA and automatically recognizes the temporal smoothness of the fMRI data. Furthermore, unlike standard univoxel methods, it can handle non-stationary noise. 2) We introduce a novel sparse variable PCA (svPCA) method that automatically excludes whole voxel timeseries, and yields sparse eigenimages. This is achieved by a novel nonlinear penalized likelihood function which is optimized. An iterative estimation algorithm is proposed that makes use of geodesic descent methods. 3) We introduce a novel method based on Stein’s Unbiased Risk Estimator (SURE) and Random Matrix Theory (RMT) to select the number of principal components for the increasingly important case where the number of observations is of similar order as the number of variables.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57638/2/mulfarss_1.pd
    corecore