844 research outputs found
Principal Nested Spheres for Time Warped Functional Data Analysis
There are often two important types of variation in functional data: the
horizontal (or phase) variation and the vertical (or amplitude) variation.
These two types of variation have been appropriately separated and modeled
through a domain warping method (or curve registration) based on the Fisher Rao
metric. This paper focuses on the analysis of the horizontal variation,
captured by the domain warping functions. The square-root velocity function
representation transforms the manifold of the warping functions to a Hilbert
sphere. Motivated by recent results on manifold analogs of principal component
analysis, we propose to analyze the horizontal variation via a Principal Nested
Spheres approach. Compared with earlier approaches, such as approximating
tangent plane principal component analysis, this is seen to be the most
efficient and interpretable approach to decompose the horizontal variation in
some examples
A scale-based approach to finding effective dimensionality in manifold learning
The discovering of low-dimensional manifolds in high-dimensional data is one
of the main goals in manifold learning. We propose a new approach to identify
the effective dimension (intrinsic dimension) of low-dimensional manifolds. The
scale space viewpoint is the key to our approach enabling us to meet the
challenge of noisy data. Our approach finds the effective dimensionality of the
data over all scale without any prior knowledge. It has better performance
compared with other methods especially in the presence of relatively large
noise and is computationally efficient.Comment: Published in at http://dx.doi.org/10.1214/07-EJS137 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
PCA consistency in high dimension, low sample size context
Principal Component Analysis (PCA) is an important tool of dimension
reduction especially when the dimension (or the number of variables) is very
high. Asymptotic studies where the sample size is fixed, and the dimension
grows [i.e., High Dimension, Low Sample Size (HDLSS)] are becoming increasingly
relevant. We investigate the asymptotic behavior of the Principal Component
(PC) directions. HDLSS asymptotics are used to study consistency, strong
inconsistency and subspace consistency. We show that if the first few
eigenvalues of a population covariance matrix are large enough compared to the
others, then the corresponding estimated PC directions are consistent or
converge to the appropriate subspace (subspace consistency) and most other PC
directions are strongly inconsistent. Broad sets of sufficient conditions for
each of these cases are specified and the main theorem gives a catalogue of
possible combinations. In preparation for these results, we show that the
geometric representation of HDLSS data holds under general conditions, which
includes a -mixing condition and a broad range of sphericity measures of
the covariance matrix.Comment: Published in at http://dx.doi.org/10.1214/09-AOS709 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
SiZer for time series: A new approach to the analysis of trends
Smoothing methods and SiZer are a useful statistical tool for discovering
statistically significant structure in data. Based on scale space ideas
originally developed in the computer vision literature, SiZer (SIgnificant ZERo
crossing of the derivatives) is a graphical device to assess which observed
features are `really there' and which are just spurious sampling artifacts. In
this paper, we develop SiZer like ideas in time series analysis to address the
important issue of significance of trends. This is not a straightforward
extension, since one data set does not contain the information needed to
distinguish `trend' from `dependence'. A new visualization is proposed, which
shows the statistician the range of trade-offs that are available. Simulation
and real data results illustrate the effectiveness of the method.Comment: Published at http://dx.doi.org/10.1214/07-EJS006 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Principal arc analysis on direct product manifolds
We propose a new approach to analyze data that naturally lie on manifolds. We
focus on a special class of manifolds, called direct product manifolds, whose
intrinsic dimension could be very high. Our method finds a low-dimensional
representation of the manifold that can be used to find and visualize the
principal modes of variation of the data, as Principal Component Analysis (PCA)
does in linear spaces. The proposed method improves upon earlier manifold
extensions of PCA by more concisely capturing important nonlinear modes. For
the special case of data on a sphere, variation following nongeodesic arcs is
captured in a single mode, compared to the two modes needed by previous
methods. Several computational and statistical challenges are resolved. The
development on spheres forms the basis of principal arc analysis on more
complicated manifolds. The benefits of the method are illustrated by a data
example using medial representations in image analysis.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS370 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Significance Analysis for Pairwise Variable Selection in Classification
The goal of this article is to select important variables that can
distinguish one class of data from another. A marginal variable selection
method ranks the marginal effects for classification of individual variables,
and is a useful and efficient approach for variable selection. Our focus here
is to consider the bivariate effect, in addition to the marginal effect. In
particular, we are interested in those pairs of variables that can lead to
accurate classification predictions when they are viewed jointly. To accomplish
this, we propose a permutation test called Significance test of Joint Effect
(SigJEff). In the absence of joint effect in the data, SigJEff is similar or
equivalent to many marginal methods. However, when joint effects exist, our
method can significantly boost the performance of variable selection. Such
joint effects can help to provide additional, and sometimes dominating,
advantage for classification. We illustrate and validate our approach using
both simulated example and a real glioblastoma multiforme data set, which
provide promising results.Comment: 28 pages, 7 figure
- …