Interpretable and fast dimension reduction of multivariate data

Abstract

The main objective of this thesis is to propose new techniques to simplify the interpretation of newly formed 'variables' or components, while reducing the dimensionality of multivariate data. Most attention is given to the interpretation of principal components, although one chapter is devoted to that of factors in factor analysis. Sparse principal components are proposed, in which some of the component loadings are made exactly zero. One approach is to make use of the idea of correlation biplots, where orthogonal matrix of sparse loadings is obtained from computing the biplot factors of the product of principal component loading matrix and functions of their variances. Other approaches in volve clustering of variables as a pre-processings tep, so that sparse components are computed from the data or correlation matrix of each cluster. New clustering techniques are proposed for this purpose. In addition, a penalized varimax approach is proposed for simplifying the interpretation of factors in factor analysis, especially for factor solutions with considerably different sum of squares. This is done by adding a penalty term to the ordinary varimax criterion. Data sets of varying sizes, both synthetic and real, are used to illustrate the proposed methods, and the results are compared with those of existing ones. In the case of principal component analysis, the resulting sparse components are found to be more interpretable (sparser) and explain higher cumulative percentage of adjusted variance compared to their counterparts from other techniques. The penalized varimax approach contributes in finding a factor solution with simple structures which are not revealed by the standard varimax solution. The proposed methods are very simple to understand and involve fast algorithms compared to some of the existing methods. They contribute much to the interpretation of components in a reduced dimension while dealing with dimensionality reduction of multivariate data

    Similar works