1,578,175 research outputs found
Introduction to Principal Components Analysis
Understanding the inverse equivalent width - luminosity relationship (Baldwin
Effect), the topic of this meeting, requires extracting information on
continuum and emission line parameters from samples of AGN. We wish to discover
whether, and how, different subsets of measured parameters may correlate with
each other. This general problem is the domain of Principal Components Analysis
(PCA). We discuss the purpose, principles, and the interpretation of PCA, using
some examples from QSO spectroscopy. The hope is that identification of
relationships among subsets of correlated variables may lead to new physical
insight.Comment: Invited review to appear in ``Quasars and Cosmology'', A.S.P.
Conference Series 1999. eds. G. J. Ferland, J. A. Baldwin, (San Francisco:
ASP). 10 pages, 2 figure
Integrating Data Transformation in Principal Components Analysis
Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online
Properties of Design-Based Functional Principal Components Analysis
This work aims at performing Functional Principal Components Analysis (FPCA)
with Horvitz-Thompson estimators when the observations are curves collected
with survey sampling techniques. One important motivation for this study is
that FPCA is a dimension reduction tool which is the first step to develop
model assisted approaches that can take auxiliary information into account.
FPCA relies on the estimation of the eigenelements of the covariance operator
which can be seen as nonlinear functionals. Adapting to our functional context
the linearization technique based on the influence function developed by
Deville (1999), we prove that these estimators are asymptotically design
unbiased and consistent. Under mild assumptions, asymptotic variances are
derived for the FPCA' estimators and consistent estimators of them are
proposed. Our approach is illustrated with a simulation study and we check the
good properties of the proposed estimators of the eigenelements as well as
their variance estimators obtained with the linearization approach.Comment: Revised version for J. of Statistical Planning and Inference (January
2009
Sparse logistic principal components analysis for binary data
We develop a new principal components analysis (PCA) type dimension reduction
method for binary data. Different from the standard PCA which is defined on the
observed data, the proposed PCA is defined on the logit transform of the
success probabilities of the binary observations. Sparsity is introduced to the
principal component (PC) loading vectors for enhanced interpretability and more
stable extraction of the principal components. Our sparse PCA is formulated as
solving an optimization problem with a criterion function motivated from a
penalized Bernoulli likelihood. A Majorization--Minimization algorithm is
developed to efficiently solve the optimization problem. The effectiveness of
the proposed sparse logistic PCA method is illustrated by application to a
single nucleotide polymorphism data set and a simulation study.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS327 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Principal Components and Factor Analysis. A Comparative Study.
A comparison between Principal Component Analysis (PCA) and Factor Analysis (FA) is performed both theoretically and empirically for a random matrix X:(n x p) , where n is the number of observations and both coordinates may be very large. The comparison surveys the asymptotic properties of the factor scores, of the singular values and of all other elements involved, as well as the characteristics of the methods utilized for detecting the true dimension of X. In particular, the norms of the FA scores, whichever their number, and the norms of their covariance matrix are shown to be always smaller and to decay faster as n goes to infinity. This causes the FA scores, when utilized as regressors and/or instruments, to produce more efficient slope estimators in instrumental variable estimation. Moreover, as compared to PCA, the FA scores and factors exhibit a higher degree of consistency because the difference between the estimated and their true counterparts is smaller, and so is also the corresponding variance. Finally, FA usually selects a much less encumbering number of scores than PCA, greatly facilitating the search and identification of the common components of X.Principal Components, Factor Analysis, Matrix Norm
Principal Components Analysis of Employment in Eastern Europe
For the last decade, the employment structure is one of the fastest changing areas of Eastern Europe. This paper explores the best methodology to compare the employment situations in the countries of this region. Multivariate statistical analyses are very reliable in portraying the full picture of the problem. Principal components analysis is one of the simplest multivariate methods. It can produce very useful information about Eastern European employment in a very easy and understandable way.Employment, Multivariate analysis, Principal components analysis
Functional principal components analysis via penalized rank one approximation
Two existing approaches to functional principal components analysis (FPCA)
are due to Rice and Silverman (1991) and Silverman (1996), both based on
maximizing variance but introducing penalization in different ways. In this
article we propose an alternative approach to FPCA using penalized rank one
approximation to the data matrix. Our contributions are four-fold: (1) by
considering invariance under scale transformation of the measurements, the new
formulation sheds light on how regularization should be performed for FPCA and
suggests an efficient power algorithm for computation; (2) it naturally
incorporates spline smoothing of discretized functional data; (3) the
connection with smoothing splines also facilitates construction of
cross-validation or generalized cross-validation criteria for smoothing
parameter selection that allows efficient computation; (4) different smoothing
parameters are permitted for different FPCs. The methodology is illustrated
with a real data example and a simulation.Comment: Published in at http://dx.doi.org/10.1214/08-EJS218 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …
