Search CORE

389 research outputs found

Principal Boundary on Riemannian Manifolds

Author: Yao Zhigang
Zhang Zhenyue
Publication venue
Publication date: 30/03/2019
Field of study

We consider the classification problem and focus on nonlinear methods for classification on manifolds. For multivariate datasets lying on an embedded nonlinear Riemannian manifold within the higher-dimensional ambient space, we aim to acquire a classification boundary for the classes with labels, using the intrinsic metric on the manifolds. Motivated by finding an optimal boundary between the two classes, we invent a novel approach -- the principal boundary. From the perspective of classification, the principal boundary is defined as an optimal curve that moves in between the principal flows traced out from two classes of data, and at any point on the boundary, it maximizes the margin between the two classes. We estimate the boundary in quality with its direction, supervised by the two principal flows. We show that the principal boundary yields the usual decision boundary found by the support vector machine in the sense that locally, the two boundaries coincide. Some optimality and convergence properties of the random principal boundary and its population counterpart are also shown. We illustrate how to find, use and interpret the principal boundary with an application in real data.Comment: 31 pages,10 figure

arXiv.org e-Print Archive

ScholarBank@NUS

FigShare

Fixed Boundary Flows

Author: Fan Zengyan
Xia Yuqing
Yao Zhigang
Publication venue
Publication date: 24/04/2019
Field of study

We consider the fixed boundary flow with canonical interpretability as principal components extended on the non-linear Riemannian manifolds. We aim to find a flow with fixed starting and ending point for multivariate datasets lying on an embedded non-linear Riemannian manifold, differing from the principal flow that starts from the center of the data cloud. Both points are given in advance, using the intrinsic metric on the manifolds. From the perspective of geometry, the fixed boundary flow is defined as an optimal curve that moves in the data cloud. At any point on the flow, it maximizes the inner product of the vector field, which is calculated locally, and the tangent vector of the flow. We call the new flow the fixed boundary flow. The rigorous definition is given by means of an Euler-Lagrange problem, and its solution is reduced to that of a Differential Algebraic Equation (DAE). A high level algorithm is created to numerically compute the fixed boundary. We show that the fixed boundary flow yields a concatenate of three segments, one of which coincides with the usual principal flow when the manifold is reduced to the Euclidean space. We illustrate how the fixed boundary flow can be used and interpreted, and its application in real data

arXiv.org e-Print Archive

Recent advances in directional statistics

Author: García-Portugués Eduardo
Pewsey Arthur
Publication venue
Publication date: 22/09/2020
Field of study

Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

arXiv.org e-Print Archive

Crossref

Universidad Carlos III de Madrid e-Archivo

Doctor of Philosophy in Computing

Author: Zhang Miaomiao
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

dissertationAn important area of medical imaging research is studying anatomical diffeomorphic shape changes and detecting their relationship to disease processes. For example, neurodegenerative disorders change the shape of the brain, thus identifying differences between the healthy control subjects and patients affected by these diseases can help with understanding the disease processes. Previous research proposed a variety of mathematical approaches for statistical analysis of geometrical brain structure in three-dimensional (3D) medical imaging, including atlas building, brain variability quantification, regression, etc. The critical component in these statistical models is that the geometrical structure is represented by transformations rather than the actual image data. Despite the fact that such statistical models effectively provide a way for analyzing shape variation, none of them have a truly probabilistic interpretation. This dissertation contributes a novel Bayesian framework of statistical shape analysis for generic manifold data and its application to shape variability and brain magnetic resonance imaging (MRI). After we carefully define the distributions on manifolds, we then build Bayesian models for analyzing the intrinsic variability of manifold data, involving the mean point, principal modes, and parameter estimation. Because there is no closed-form solution for Bayesian inference of these models on manifolds, we develop a Markov Chain Monte Carlo method to sample the hidden variables from the distribution. The main advantages of these Bayesian approaches are that they provide parameter estimation and automatic dimensionality reduction for analyzing generic manifold-valued data, such as diffeomorphisms. Modeling the mean point of a group of images in a Bayesian manner allows for learning the regularity parameter from data directly rather than having to set it manually, which eliminates the effort of cross validation for parameter selection. In population studies, our Bayesian model of principal modes analysis (1) automatically extracts a low-dimensional, second-order statistics of manifold data variability and (2) gives a better geometric data fit than nonprobabilistic models. To make this Bayesian framework computationally more efficient for high-dimensional diffeomorphisms, this dissertation presents an algorithm, FLASH (finite-dimensional Lie algebras for shooting), that hugely speeds up the diffeomorphic image registration. Instead of formulating diffeomorphisms in a continuous variational problem, Flash defines a completely new discrete reparameterization of diffeomorphisms in a low-dimensional bandlimited velocity space, which results in the Bayesian inference via sampling on the space of diffeomorphisms being more feasible in time. Our entire Bayesian framework in this dissertation is used for statistical analysis of shape data and brain MRIs. It has the potential to improve hypothesis testing, classification, and mixture models

The University of Utah: J. Willard Marriott Digital Library

리만다양체 상의 비모수적 차원축소방법론

Author: 이종민
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 자연과학대학 통계학과, 2022. 8. 오희석.Over the decades, parametric dimension reduction methods have been actively developed for non-Euclidean data analysis. Examples include Fletcher et al., 2004; Huckemann et al., 2010; Jung et al., 2011; Jung et al., 2012; Zhang et al., 2013. Sometimes the methods are not enough to capture the structure of data. This dissertation presents newly developed nonparametric dimension reductions for data observed on manifold, resulting in more flexible fits. More precisely, the main focus is on the generalizations of principal curves into Riemannian manifold. The principal curve is considered as a nonlinear generalization of principal component analysis (PCA). The dissertation consists of four main parts as follows. First, the approach given in Chapter 3 lie in the same lines of Hastie (1984) and Hastie and Stuetzle (1989) that introduced the definition of original principal curve on Euclidean space. The main contributions of this study can be summarized as follows: (a) We propose both extrinsic and intrinsic approaches to form principal curves on spheres. (b) We establish the stationarity of the proposed principal curves on spheres. (c) In extensive numerical studies, we show the usefulness of the proposed method through real seismological data and real Human motion capture data as well as simulated data on 2-sphere, 4-sphere. Secondly, As one of further work in the previous approach, a robust nonparametric dimension reduction is proposed. To this ends, absolute loss and Huber loss are used rather than L2 loss. The contributions of Chapter 4 can be summarized as follows: (a) We study robust principal curves on spheres that are resistant to outliers. Specifically, we propose absolute-type and Huber-type principal curves, which go through the median of data, to robustify the principal curves for a set of data which may contain outliers. (b) For a theoretical aspect, the stationarity of the robust principal curves is investigated. (c) We provide practical algorithms for implementing the proposed robust principal curves, which are computationally feasible and more convenient to implement. Thirdly, An R package 'spherepc' comprehensively providing dimension reduction methods on a sphere is introduced with details for possible reproducible research. To the best of our knowledge, no available R packages offer the methods of dimension reduction and principal curves on a sphere. The existing R packages providing principal curves, such as 'princurve' and 'LPCM', are available only on Euclidean space. In addition, most nonparametric dimension reduction methods on manifold involve somewhat complex intrinsic optimizations. The proposed R package 'spherepc' provides the state-of-the-art principal curve technique on the sphere and comprehensively collects and implements the existing techniques. Lastly, for an effective initial estimate of complex structured data on manifold, local principal geodesics are first provided and the method is applied to various simulated and real seismological data. For variance stabilization and theoretical investigations for the procedure, nextly, the focus is on the generalization of Kégl (1999); Kégl et al., (2000), which provided the new definition of principal curve on Euclidean space, into generic Riemannian manifolds. Theories including consistency and convergence rate of the procedure by means of empirical risk minimization principle, are further established on generic Riemannian manifolds. The consequences on the real data analysis and simulation study show the promising characteristics of the proposed approach.본 학위 논문은 다양체 자료의 변동성을 더욱 효과적으로 찾아내기 위해, 다양체 자료의 새로운 비모수적 차원축소방법론을 제시하였다. 구체적으로, 주곡선(principal curves) 방법을 일반적인 다양체 공간으로 확장하는 것이 주요 연구 주제이다. 주곡선은 주성분분석(PCA)의 비선형적 확장 중 하나이며, 본 학위논문은 크게 네 가지의 주제로 이루어져 있다. 첫 번째로, Hastie (1984), Hastie and Stuetzle (1989}의 방법을 임의의 차원의 구면으로 표준적인 방식으로 확장한다. 이 연구 주제의 공헌은 다음과 같다. (a) 임의의 차원의 구면에서 내재적, 외재적인 방식의 주곡선 방법을 각각 제안한다. (b) 본 방법의 이론적 성질(정상성)을 규명한다. (c) 지질학적 자료 및 인간 움직임 자료 등의 실제 자료와 2차원, 4차원 구면위의 시뮬레이션 자료에 본 방법을 적용하여, 그 유용성을 보인다. 두 번째로, 첫 번째 주제의 후속 연구 중 하나로서, 두꺼운 꼬리 분포를 가지는 자료에 대하여 강건한 비모수적 차원축소 방법을 제안한다. 이를 위해, L2 손실함수 대신에 L1 및 휴버(Huber) 손실함수를 활용한다. 이 연구 주제의 공헌은 다음과 같다. (a) 이상치에 민감하지 않은 강건화주곡선(robust principal curves)을 정의한다. 구체적으로, 자료의 기하적 중심점을 지나는 L1 및 휴버 손실함수에 대응되는 새로운 주곡선을 제안한다. (b) 이론적인 측면에서, 강건화주곡선의 정상성을 규명한다. (c) 강건화주곡선을 구현하기 위해 계산이 빠른 실용적인 알고리즘을 제안한다. 세 번째로, 기존의 차원축소방법 및 본 방법론을 제공하는 R 패키지를 구현하였으며 이를 다양한 예제 및 설명과 함께 소개한다. 본 방법론의 강점은 다양체 위에서의 복잡한 최적화 방정식을 풀지않고, 직관적인 방식으로 구현 가능하다는 점이다. R 패키지로 구현되어 제공된다는 점이 이를 방증하며, 본 학위 논문의 연구를 재현가능하게 만든다. 마지막으로, 보다 복잡한 구조를 가지는 다양체 자료의 구조를 추정하기위해, 국소주측지선분석(local principal geodesics) 방법을 우선 제안한다. 이 방법을 실제 지질학 자료 및 다양한 모의실험 자료에 적용하여 그 활용성을 보였다. 다음으로, 추정치의 분산안정화 및 이론적 정당화를 위하여 Kégl (1999), Kégl et al., (2000) 방법을 일반적인 리만다양체로 확장한다. 더 나아가, 방법론의 일치성, 수렴속도와 같은 점근적 성질을 비롯하여 비점근적 성질인 집중부등식(concentration inequality)을 통계적학습이론을 이용하여 규명한다.1 Introduction 1 2 Preliminaries 8 2.1 Principal curves 8 2.1 Riemannian manifolds and centrality on manifold 10 2.1 Principal curves on Riemannian manifolds 14 3 Spherical principal curves 15 3.1 Enhancement of principal circle for initialization 16 3.2 Proposed principal curves 25 3.3 Numerical experiments 34 3.4 Proofs 45 3.5 Concluding remarks 62 4 Robust spherical principal curves 64 4.1 The proposed robust principal curves 64 4.2 Stationarity of robust spherical principal curves 72 4.3 Numerical experiments 74 4.4 Summary and future work 80 5 spherepc: An R package for dimension reduction on a sphere 84 5.1 Existing methods 85 5.2 Spherical principal curves 91 5.3 Local principal geodesics 94 5.4 Application 99 5.5 Conclusion 101 6 Local principal curves on Riemannian manifolds 112 6.1 Preliminaries 116 6.2 Local principal geodesics 118 6.3 Local principal curves 125 6.4 Real data analysis 133 6.5 Further work 133 7 Conclusion 139 A. Appendix 141 A.1. Appendix for Chapter 3 141 A.2. Appendix for Chapter 4 145 A.3. Appendix for Chapter 6 152 Abstract in Korean 176 Acknowledgement in Korean 179박

SNU Open Repository and Archive

Nonparametric Uncertainty Quantification for Stochastic Gradient Flows

Author: Berry Tyrus
Harlim John
Publication venue
Publication date: 07/02/2015
Field of study

This paper presents a nonparametric statistical modeling method for quantifying uncertainty in stochastic gradient systems with isotropic diffusion. The central idea is to apply the diffusion maps algorithm to a training data set to produce a stochastic matrix whose generator is a discrete approximation to the backward Kolmogorov operator of the underlying dynamics. The eigenvectors of this stochastic matrix, which we will refer to as the diffusion coordinates, are discrete approximations to the eigenfunctions of the Kolmogorov operator and form an orthonormal basis for functions defined on the data set. Using this basis, we consider the projection of three uncertainty quantification (UQ) problems (prediction, filtering, and response) into the diffusion coordinates. In these coordinates, the nonlinear prediction and response problems reduce to solving systems of infinite-dimensional linear ordinary differential equations. Similarly, the continuous-time nonlinear filtering problem reduces to solving a system of infinite-dimensional linear stochastic differential equations. Solving the UQ problems then reduces to solving the corresponding truncated linear systems in finitely many diffusion coordinates. By solving these systems we give a model-free algorithm for UQ on gradient flow systems with isotropic diffusion. We numerically verify these algorithms on a 1-dimensional linear gradient flow system where the analytic solutions of the UQ problems are known. We also apply the algorithm to a chaotically forced nonlinear gradient flow system which is known to be well approximated as a stochastically forced gradient flow.Comment: Find the associated videos at: http://personal.psu.edu/thb11

arXiv.org e-Print Archive

CiteSeerX