Search CORE

12 research outputs found

Unsupervised Feature Extraction Using Singular Value Decomposition

Author: Modarresi Kourosh
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 31/12/2015
Field of study

AbstractThough modern data often provides a massive amount of information, much of the insight might be redundant or useless (noise). Thus, it is significant to recognize the most informative features of data. This will help the analysis of the data by removing the consequences of high dimensionality, in addition of obtaining other advantages of lower dimensional data such as lower computational cost and a less complex model. Modern data has high dimension, sparsity and correlation besides its characteristics of being unstructured, distorted, corrupt, deformed, and massive. Feature extraction has always been a major toll in machine learning applications. Due to these extraordinary features of modern data, feature extraction and feature reduction models and techniques have even more significance in analyzing and understanding the data

Elsevier - Publisher Connector

Varimax rotation based on gradient projection needs between 10 and more than 500 random start loading matrices for optimal performance

Author: Beauducel André
Weide Anneke Cleopatra
Publication venue: 'Frontiers Media SA'
Publication date: 13/09/2018
Field of study

Gradient projection rotation (GPR) is a promising method to rotate factor or component loadings by different criteria. Since the conditions for optimal performance of GPR-Varimax are widely unknown, this simulation study investigates GPR towards the Varimax criterion in principal component analysis. The conditions of the simulation study comprise two sample sizes (n = 100, n = 300), with orthogonal simple structure population models based on four numbers of components (3, 6, 9, 12), with- and without Kaiser-normalization, and six numbers of random start loading matrices for GPR-Varimax rotation (1, 10, 50, 100, 500, 1,000). GPR-Varimax rotation always performed better when at least 10 random matrices were used for start loadings instead of the identity matrix. GPR-Varimax worked better for a small number of components, larger (n = 300) as compared to smaller (n = 100) samples, and when loadings were Kaiser-normalized before rotation. To ensure optimal (stationary) performance of GPR-Varimax in recovering orthogonal simple structure, we recommend using at least 10 iterations of start loading matrices for the rotation of up to three components and 50 iterations for up to six components. For up to nine components, rotation should be based on a sample size of at least 300 cases, Kaiser-normalization, and more than 50 different start loading matrices. For more than nine components, GPR-Varimax rotation should be based on at least 300 cases, Kaiser-normalization, and at least 500 different start loading matrices.Comment: 19 pages, 8 figures, 2 tables, 4 figures in the Supplemen

arXiv.org e-Print Archive

Directory of Open Access Journals

Robust sparse principal component analysis.

Author: Croux Christophe
Filzmoser Peter
Fritz Heinrich
Publication venue
Publication date
Field of study

A method for principal component analysis is proposed that is sparse and robust at the same time. The sparsity delivers principal components that have loadings on a small number of variables, making them easier to interpret. The robustness makes the analysis resistant to outlying observations. The principal components correspond to directions that maximize a robust measure of the variance, with an additional penalty term to take sparseness into account. We propose an algorithm to compute the sparse and robust principal components. The method is applied on several real data examples, and diagnostic plots for detecting outliers and for selecting the degree of sparsity are provided. A simulation experiment studies the loss in statistical efficiency by requiring both robustness and sparsity.Dispersion measure; Projection-pursuit; Outliers; Variable selection;

Research Papers in Economics

A sparse PLS for variable selection when integrating omics data

Author: Besse Philippe
Lê Cao Kim-Anh
Robert-Granié Christèle
Rossouw Debra
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2008
Field of study

Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called "sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition. Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets. Copyright ©2008 The Berkeley Electronic Press. All rights reserved

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

University of Queensland eSpace

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Author: Luss Ronny
Teboulle Marc
Publication venue
Publication date: 20/06/2012
Field of study

The sparsity constrained rank-one matrix approximation problem is a difficult mathematical optimization problem which arises in a wide array of useful applications in engineering, machine learning and statistics, and the design of algorithms for this problem has attracted intensive research activities. We introduce an algorithmic framework, called ConGradU, that unifies a variety of seemingly different algorithms that have been derived from disparate approaches, and allows for deriving new schemes. Building on the old and well-known conditional gradient algorithm, ConGradU is a simplified version with unit step size and yields a generic algorithm which either is given by an analytic formula or requires a very low computational complexity. Mathematical properties are systematically developed and numerical experiments are given.Comment: Minor changes. Final version. To appear in SIAM Revie

arXiv.org e-Print Archive

CiteSeerX

Análisis de componentes principales Sparse : formulación, algoritmos e implicaciones en el análisis de datos

Author: González García Nerea
Taborda Londoño Alejandra
Publication venue
Publication date: 01/01/2015
Field of study

Trabajo de Fin de Máster en Análisis Avanzado de Datos Multivariantes. Curso 2014-2015[ES]El Análisis de Componentes principales es una de las técnicas más implementada en las etapas de pre-procesamiento o de reducción de la dimensión de matrices de datos. Su principal función es proyectar los datos de entrada en nuevas direcciones, conocidas como componentes principales (PCs), que absorban la mayor cantidad de información posible y así, poder eliminar aquellas variables que aporten menos variabilidad. Sin embargo, la interpretación de las PCs es complicada, pues resultan de la combinación lineal de todas las variables originales. Es por ello que surgen distintas formas de enfrentar esta problemática; como los conocidos métodos de rotación. En este trabajo, se presenta el Análisis de Componentes Principales Sparse (SPCA) como otra forma de solventar esta dificultad. Es un método de selección de variables características, intentando que gran parte de las cargas que definen las PCs sean nulas (cargas sparse). A partir de la búsqueda de bibliografía relevante, se redactará el estado del arte del SPCA, integrando los enfoques de maximización de la varianza y minimización del error en el SPCA. Profundamente, se enfocará la técnica a partir de la reformulación del PCA como problema de minimización del error, aprovechando los desarrollos de los modelos de regresión lineal e integrando restricciones típicas de estos, como la penalización Elastic net, para mejorar el análisis de datos. Se comienza entonces con la formulación del SPCA, los algoritmos e implicaciones en el análisis de datos, comparando diferencias entre las componentes principales clásicas, las soluciones rotadas y las soluciones sparse

Gestion del Repositorio Documental de la Universidad de Salamanca

Model Based Principal Component Analysis with Application to Functional Magnetic Resonance Imaging.

Author: Ulfarsson Magnus O.
Publication venue
Publication date: 01/01/2007
Field of study

Functional Magnetic Resonance Imaging (fMRI) has allowed better understanding of human brain organization and function by making it possible to record either autonomous or stimulus induced brain activity. After appropriate preprocessing fMRI produces a large spatio-temporal data set, which requires sophisticated signal processing. The aim of the signal processing is usually to produce spatial maps of statistics that capture the effects of interest, e.g., brain activation, time delay between stimulation and activation, or connectivity between brain regions. Two broad signal processing approaches have been pursued; univoxel methods and multivoxel methods. This proposal will focus on multivoxel methods and review Principal Component Analysis (PCA), and other closely related methods, and describe their advantages and disadvantages in fMRI research. These existing multivoxel methods have in common that they are exploratory, i.e., they are not based on a statistical model. A crucial observation which is central to this thesis, is that there is in fact an underlying model behind PCA, which we call noisy PCA (nPCA). In the main part of this thesis, we use nPCA to develop methods that solve three important problems in fMRI. 1) We introduce a novel nPCA based spatio-temporal model that combines the standard univoxel regression model with nPCA and automatically recognizes the temporal smoothness of the fMRI data. Furthermore, unlike standard univoxel methods, it can handle non-stationary noise. 2) We introduce a novel sparse variable PCA (svPCA) method that automatically excludes whole voxel timeseries, and yields sparse eigenimages. This is achieved by a novel nonlinear penalized likelihood function which is optimized. An iterative estimation algorithm is proposed that makes use of geodesic descent methods. 3) We introduce a novel method based on Stein’s Unbiased Risk Estimator (SURE) and Random Matrix Theory (RMT) to select the number of principal components for the increasingly important case where the number of observations is of similar order as the number of variables.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57638/2/mulfarss_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Projected gradient approach to the numerical solution of the SCoTLASS

Author: Jolliffe I. T.
Trendafilov N. T.
Publication venue
Publication date: 01/01/2006
Field of study

The SCoTLASS problem-principal component analysis modified so that the components satisfy the Least Absolute Shrinkage and Selection Operator (LASSO) constraint-is reformulated as a dynamical system on the unit sphere. The LASSO inequality constraint is tackled by exterior penalty function. A globally convergent algorithm is developed based on the projected gradient approach. The algorithm is illustrated numerically and discussed on a well-known data set. (c) 2004 Elsevier B.V. All rights reserved

Central Archive at the University of Reading

Open Research Online (The Open University)