28,140 research outputs found
An Object-Oriented Framework for Robust Multivariate Analysis
Taking advantage of the S4 class system of the programming environment R, which facilitates the creation and maintenance of reusable and modular components, an object-oriented framework for robust multivariate analysis was developed. The framework resides in the packages robustbase and rrcov and includes an almost complete set of algorithms for computing robust multivariate location and scatter, various robust methods for principal component analysis as well as robust linear and quadratic discriminant analysis. The design of these methods follows common patterns which we call statistical design patterns in analogy to the design patterns widely used in software engineering. The application of the framework to data analysis as well as possible extensions by the development of new methods is demonstrated on examples which themselves are part of the package rrcov.
Properties of principal component methods for functional and longitudinal data analysis
The use of principal component methods to analyze functional data is
appropriate in a wide range of different settings. In studies of ``functional
data analysis,'' it has often been assumed that a sample of random functions is
observed precisely, in the continuum and without noise. While this has been the
traditional setting for functional data analysis, in the context of
longitudinal data analysis a random function typically represents a patient, or
subject, who is observed at only a small number of randomly distributed points,
with nonnegligible measurement error. Nevertheless, essentially the same
methods can be used in both these cases, as well as in the vast number of
settings that lie between them. How is performance affected by the sampling
plan? In this paper we answer that question. We show that if there is a sample
of functions, or subjects, then estimation of eigenvalues is a
semiparametric problem, with root- consistent estimators, even if only a few
observations are made of each function, and if each observation is encumbered
by noise. However, estimation of eigenfunctions becomes a nonparametric problem
when observations are sparse. The optimal convergence rates in this case are
those which pertain to more familiar function-estimation settings. We also
describe the effects of sampling at regularly spaced points, as opposed to
random points. In particular, it is shown that there are often advantages in
sampling randomly. However, even in the case of noisy data there is a threshold
sampling rate (depending on the number of functions treated) above which the
rate of sampling (either randomly or regularly) has negligible impact on
estimator performance, no matter whether eigenfunctions or eigenvectors are
being estimated.Comment: Published at http://dx.doi.org/10.1214/009053606000000272 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Principal arc analysis on direct product manifolds
We propose a new approach to analyze data that naturally lie on manifolds. We
focus on a special class of manifolds, called direct product manifolds, whose
intrinsic dimension could be very high. Our method finds a low-dimensional
representation of the manifold that can be used to find and visualize the
principal modes of variation of the data, as Principal Component Analysis (PCA)
does in linear spaces. The proposed method improves upon earlier manifold
extensions of PCA by more concisely capturing important nonlinear modes. For
the special case of data on a sphere, variation following nongeodesic arcs is
captured in a single mode, compared to the two modes needed by previous
methods. Several computational and statistical challenges are resolved. The
development on spheres forms the basis of principal arc analysis on more
complicated manifolds. The benefits of the method are illustrated by a data
example using medial representations in image analysis.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS370 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Robust functional principal components: A projection-pursuit approach
In many situations, data are recorded over a period of time and may be
regarded as realizations of a stochastic process. In this paper, robust
estimators for the principal components are considered by adapting the
projection pursuit approach to the functional data setting. Our approach
combines robust projection-pursuit with different smoothing methods.
Consistency of the estimators are shown under mild assumptions. The performance
of the classical and robust procedures are compared in a simulation study under
different contamination schemes.Comment: Published in at http://dx.doi.org/10.1214/11-AOS923 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Covariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive. We provide a survey of the
progress made in modeling covariance matrices from two relatively complementary
perspectives: (1) generalized linear models (GLM) or parsimony and use of
covariates in low dimensions, and (2) regularization or sparsity for
high-dimensional data. An emerging, unifying and powerful trend in both
perspectives is that of reducing a covariance estimation problem to that of
estimating a sequence of regression problems. We point out several instances of
the regression-based formulation. A notable case is in sparse estimation of a
precision matrix or a Gaussian graphical model leading to the fast graphical
LASSO algorithm. Some advantages and limitations of the regression-based
Cholesky decomposition relative to the classical spectral (eigenvalue) and
variance-correlation decompositions are highlighted. The former provides an
unconstrained and statistically interpretable reparameterization, and
guarantees the positive-definiteness of the estimated covariance matrix. It
reduces the unintuitive task of covariance estimation to that of modeling a
sequence of regressions at the cost of imposing an a priori order among the
variables. Elementwise regularization of the sample covariance matrix such as
banding, tapering and thresholding has desirable asymptotic properties and the
sparse estimated covariance matrix is positive definite with probability
tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …