840 research outputs found

    Principal Nested Spheres for Time Warped Functional Data Analysis

    Full text link
    There are often two important types of variation in functional data: the horizontal (or phase) variation and the vertical (or amplitude) variation. These two types of variation have been appropriately separated and modeled through a domain warping method (or curve registration) based on the Fisher Rao metric. This paper focuses on the analysis of the horizontal variation, captured by the domain warping functions. The square-root velocity function representation transforms the manifold of the warping functions to a Hilbert sphere. Motivated by recent results on manifold analogs of principal component analysis, we propose to analyze the horizontal variation via a Principal Nested Spheres approach. Compared with earlier approaches, such as approximating tangent plane principal component analysis, this is seen to be the most efficient and interpretable approach to decompose the horizontal variation in some examples

    A scale-based approach to finding effective dimensionality in manifold learning

    Get PDF
    The discovering of low-dimensional manifolds in high-dimensional data is one of the main goals in manifold learning. We propose a new approach to identify the effective dimension (intrinsic dimension) of low-dimensional manifolds. The scale space viewpoint is the key to our approach enabling us to meet the challenge of noisy data. Our approach finds the effective dimensionality of the data over all scale without any prior knowledge. It has better performance compared with other methods especially in the presence of relatively large noise and is computationally efficient.Comment: Published in at http://dx.doi.org/10.1214/07-EJS137 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    PCA consistency in high dimension, low sample size context

    Get PDF
    Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows [i.e., High Dimension, Low Sample Size (HDLSS)] are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLSS asymptotics are used to study consistency, strong inconsistency and subspace consistency. We show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most other PC directions are strongly inconsistent. Broad sets of sufficient conditions for each of these cases are specified and the main theorem gives a catalogue of possible combinations. In preparation for these results, we show that the geometric representation of HDLSS data holds under general conditions, which includes a ρ\rho-mixing condition and a broad range of sphericity measures of the covariance matrix.Comment: Published in at http://dx.doi.org/10.1214/09-AOS709 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    SiZer for time series: A new approach to the analysis of trends

    Get PDF
    Smoothing methods and SiZer are a useful statistical tool for discovering statistically significant structure in data. Based on scale space ideas originally developed in the computer vision literature, SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical device to assess which observed features are `really there' and which are just spurious sampling artifacts. In this paper, we develop SiZer like ideas in time series analysis to address the important issue of significance of trends. This is not a straightforward extension, since one data set does not contain the information needed to distinguish `trend' from `dependence'. A new visualization is proposed, which shows the statistician the range of trade-offs that are available. Simulation and real data results illustrate the effectiveness of the method.Comment: Published at http://dx.doi.org/10.1214/07-EJS006 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Principal arc analysis on direct product manifolds

    Get PDF
    We propose a new approach to analyze data that naturally lie on manifolds. We focus on a special class of manifolds, called direct product manifolds, whose intrinsic dimension could be very high. Our method finds a low-dimensional representation of the manifold that can be used to find and visualize the principal modes of variation of the data, as Principal Component Analysis (PCA) does in linear spaces. The proposed method improves upon earlier manifold extensions of PCA by more concisely capturing important nonlinear modes. For the special case of data on a sphere, variation following nongeodesic arcs is captured in a single mode, compared to the two modes needed by previous methods. Several computational and statistical challenges are resolved. The development on spheres forms the basis of principal arc analysis on more complicated manifolds. The benefits of the method are illustrated by a data example using medial representations in image analysis.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS370 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Significance Analysis for Pairwise Variable Selection in Classification

    Get PDF
    The goal of this article is to select important variables that can distinguish one class of data from another. A marginal variable selection method ranks the marginal effects for classification of individual variables, and is a useful and efficient approach for variable selection. Our focus here is to consider the bivariate effect, in addition to the marginal effect. In particular, we are interested in those pairs of variables that can lead to accurate classification predictions when they are viewed jointly. To accomplish this, we propose a permutation test called Significance test of Joint Effect (SigJEff). In the absence of joint effect in the data, SigJEff is similar or equivalent to many marginal methods. However, when joint effects exist, our method can significantly boost the performance of variable selection. Such joint effects can help to provide additional, and sometimes dominating, advantage for classification. We illustrate and validate our approach using both simulated example and a real glioblastoma multiforme data set, which provide promising results.Comment: 28 pages, 7 figure