389 research outputs found
Principal Boundary on Riemannian Manifolds
We consider the classification problem and focus on nonlinear methods for
classification on manifolds. For multivariate datasets lying on an embedded
nonlinear Riemannian manifold within the higher-dimensional ambient space, we
aim to acquire a classification boundary for the classes with labels, using the
intrinsic metric on the manifolds. Motivated by finding an optimal boundary
between the two classes, we invent a novel approach -- the principal boundary.
From the perspective of classification, the principal boundary is defined as an
optimal curve that moves in between the principal flows traced out from two
classes of data, and at any point on the boundary, it maximizes the margin
between the two classes. We estimate the boundary in quality with its
direction, supervised by the two principal flows. We show that the principal
boundary yields the usual decision boundary found by the support vector machine
in the sense that locally, the two boundaries coincide. Some optimality and
convergence properties of the random principal boundary and its population
counterpart are also shown. We illustrate how to find, use and interpret the
principal boundary with an application in real data.Comment: 31 pages,10 figure
Fixed Boundary Flows
We consider the fixed boundary flow with canonical interpretability as
principal components extended on the non-linear Riemannian manifolds. We aim to
find a flow with fixed starting and ending point for multivariate datasets
lying on an embedded non-linear Riemannian manifold, differing from the
principal flow that starts from the center of the data cloud. Both points are
given in advance, using the intrinsic metric on the manifolds. From the
perspective of geometry, the fixed boundary flow is defined as an optimal curve
that moves in the data cloud. At any point on the flow, it maximizes the inner
product of the vector field, which is calculated locally, and the tangent
vector of the flow. We call the new flow the fixed boundary flow. The rigorous
definition is given by means of an Euler-Lagrange problem, and its solution is
reduced to that of a Differential Algebraic Equation (DAE). A high level
algorithm is created to numerically compute the fixed boundary. We show that
the fixed boundary flow yields a concatenate of three segments, one of which
coincides with the usual principal flow when the manifold is reduced to the
Euclidean space. We illustrate how the fixed boundary flow can be used and
interpreted, and its application in real data
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Doctor of Philosophy in Computing
dissertationAn important area of medical imaging research is studying anatomical diffeomorphic shape changes and detecting their relationship to disease processes. For example, neurodegenerative disorders change the shape of the brain, thus identifying differences between the healthy control subjects and patients affected by these diseases can help with understanding the disease processes. Previous research proposed a variety of mathematical approaches for statistical analysis of geometrical brain structure in three-dimensional (3D) medical imaging, including atlas building, brain variability quantification, regression, etc. The critical component in these statistical models is that the geometrical structure is represented by transformations rather than the actual image data. Despite the fact that such statistical models effectively provide a way for analyzing shape variation, none of them have a truly probabilistic interpretation. This dissertation contributes a novel Bayesian framework of statistical shape analysis for generic manifold data and its application to shape variability and brain magnetic resonance imaging (MRI). After we carefully define the distributions on manifolds, we then build Bayesian models for analyzing the intrinsic variability of manifold data, involving the mean point, principal modes, and parameter estimation. Because there is no closed-form solution for Bayesian inference of these models on manifolds, we develop a Markov Chain Monte Carlo method to sample the hidden variables from the distribution. The main advantages of these Bayesian approaches are that they provide parameter estimation and automatic dimensionality reduction for analyzing generic manifold-valued data, such as diffeomorphisms. Modeling the mean point of a group of images in a Bayesian manner allows for learning the regularity parameter from data directly rather than having to set it manually, which eliminates the effort of cross validation for parameter selection. In population studies, our Bayesian model of principal modes analysis (1) automatically extracts a low-dimensional, second-order statistics of manifold data variability and (2) gives a better geometric data fit than nonprobabilistic models. To make this Bayesian framework computationally more efficient for high-dimensional diffeomorphisms, this dissertation presents an algorithm, FLASH (finite-dimensional Lie algebras for shooting), that hugely speeds up the diffeomorphic image registration. Instead of formulating diffeomorphisms in a continuous variational problem, Flash defines a completely new discrete reparameterization of diffeomorphisms in a low-dimensional bandlimited velocity space, which results in the Bayesian inference via sampling on the space of diffeomorphisms being more feasible in time. Our entire Bayesian framework in this dissertation is used for statistical analysis of shape data and brain MRIs. It has the potential to improve hypothesis testing, classification, and mixture models
๋ฆฌ๋ง๋ค์์ฒด ์์ ๋น๋ชจ์์ ์ฐจ์์ถ์๋ฐฉ๋ฒ๋ก
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์์ฐ๊ณผํ๋ํ ํต๊ณํ๊ณผ, 2022. 8. ์คํฌ์.Over the decades, parametric dimension reduction methods have been actively developed for non-Euclidean data analysis. Examples include Fletcher et al., 2004; Huckemann et al., 2010; Jung et al., 2011; Jung et al., 2012; Zhang et al., 2013. Sometimes the methods are not enough to capture the structure of data. This dissertation presents newly developed nonparametric dimension reductions for data observed on manifold, resulting in more flexible fits. More precisely, the main focus is on the generalizations of principal curves into Riemannian manifold. The principal curve is considered as a nonlinear generalization of principal component analysis (PCA). The dissertation consists of four main parts as follows.
First, the approach given in Chapter 3 lie in the same lines of Hastie (1984) and Hastie and Stuetzle (1989) that introduced the definition of original principal curve on Euclidean space. The main contributions of this study can be summarized as follows: (a) We propose both extrinsic and intrinsic approaches to form principal curves on spheres. (b) We establish the stationarity of the proposed principal curves on spheres. (c) In extensive numerical studies, we show the usefulness of the proposed method through real seismological data and real Human motion capture data as well as simulated data on 2-sphere, 4-sphere.
Secondly, As one of further work in the previous approach, a robust nonparametric dimension reduction is proposed. To this ends, absolute loss and Huber loss are used rather than L2 loss. The contributions of Chapter 4 can be summarized as follows: (a) We study robust principal curves on spheres that are resistant to outliers. Specifically, we propose absolute-type and Huber-type principal curves, which go through the median of data, to robustify the principal curves for a set of data which may contain outliers. (b) For a theoretical aspect, the stationarity of the robust principal curves is investigated. (c) We provide practical algorithms for implementing the proposed robust principal curves, which are computationally feasible and more convenient to implement.
Thirdly, An R package 'spherepc' comprehensively providing dimension reduction methods on a sphere is introduced with details for possible reproducible research. To the best of our knowledge, no available R packages offer the methods of dimension reduction and principal curves on a sphere. The existing R packages providing principal curves, such as 'princurve' and 'LPCM', are available only on Euclidean space. In addition, most nonparametric dimension reduction methods on manifold involve somewhat complex intrinsic optimizations. The proposed R package 'spherepc' provides the state-of-the-art principal curve technique on the sphere and comprehensively collects and implements the existing techniques.
Lastly, for an effective initial estimate of complex structured data on manifold, local principal geodesics are first provided and the method is applied to various simulated and real seismological data. For variance stabilization and theoretical investigations for the procedure, nextly, the focus is on the generalization of Kรฉgl (1999); Kรฉgl et al., (2000), which provided the new definition of principal curve on Euclidean space, into generic Riemannian manifolds. Theories including consistency and convergence rate of the procedure by means of empirical risk minimization principle, are further established on generic Riemannian manifolds. The consequences on the real data analysis and simulation study show the promising characteristics of the proposed approach.๋ณธ ํ์ ๋
ผ๋ฌธ์ ๋ค์์ฒด ์๋ฃ์ ๋ณ๋์ฑ์ ๋์ฑ ํจ๊ณผ์ ์ผ๋ก ์ฐพ์๋ด๊ธฐ ์ํด, ๋ค์์ฒด ์๋ฃ์ ์๋ก์ด ๋น๋ชจ์์ ์ฐจ์์ถ์๋ฐฉ๋ฒ๋ก ์ ์ ์ํ์๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ์ฃผ๊ณก์ (principal curves) ๋ฐฉ๋ฒ์ ์ผ๋ฐ์ ์ธ ๋ค์์ฒด ๊ณต๊ฐ์ผ๋ก ํ์ฅํ๋ ๊ฒ์ด ์ฃผ์ ์ฐ๊ตฌ ์ฃผ์ ์ด๋ค. ์ฃผ๊ณก์ ์ ์ฃผ์ฑ๋ถ๋ถ์(PCA)์ ๋น์ ํ์ ํ์ฅ ์ค ํ๋์ด๋ฉฐ, ๋ณธ ํ์๋
ผ๋ฌธ์ ํฌ๊ฒ ๋ค ๊ฐ์ง์ ์ฃผ์ ๋ก ์ด๋ฃจ์ด์ ธ ์๋ค.
์ฒซ ๋ฒ์งธ๋ก, Hastie (1984), Hastie and Stuetzle (1989}์ ๋ฐฉ๋ฒ์ ์์์ ์ฐจ์์ ๊ตฌ๋ฉด์ผ๋ก ํ์ค์ ์ธ ๋ฐฉ์์ผ๋ก ํ์ฅํ๋ค. ์ด ์ฐ๊ตฌ ์ฃผ์ ์ ๊ณตํ์ ๋ค์๊ณผ ๊ฐ๋ค. (a) ์์์ ์ฐจ์์ ๊ตฌ๋ฉด์์ ๋ด์ฌ์ , ์ธ์ฌ์ ์ธ ๋ฐฉ์์ ์ฃผ๊ณก์ ๋ฐฉ๋ฒ์ ๊ฐ๊ฐ ์ ์ํ๋ค. (b) ๋ณธ ๋ฐฉ๋ฒ์ ์ด๋ก ์ ์ฑ์ง(์ ์์ฑ)์ ๊ท๋ช
ํ๋ค. (c) ์ง์งํ์ ์๋ฃ ๋ฐ ์ธ๊ฐ ์์ง์ ์๋ฃ ๋ฑ์ ์ค์ ์๋ฃ์ 2์ฐจ์, 4์ฐจ์ ๊ตฌ๋ฉด์์ ์๋ฎฌ๋ ์ด์
์๋ฃ์ ๋ณธ ๋ฐฉ๋ฒ์ ์ ์ฉํ์ฌ, ๊ทธ ์ ์ฉ์ฑ์ ๋ณด์ธ๋ค.
๋ ๋ฒ์งธ๋ก, ์ฒซ ๋ฒ์งธ ์ฃผ์ ์ ํ์ ์ฐ๊ตฌ ์ค ํ๋๋ก์, ๋๊บผ์ด ๊ผฌ๋ฆฌ ๋ถํฌ๋ฅผ ๊ฐ์ง๋ ์๋ฃ์ ๋ํ์ฌ ๊ฐ๊ฑดํ ๋น๋ชจ์์ ์ฐจ์์ถ์ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์ด๋ฅผ ์ํด, L2 ์์คํจ์ ๋์ ์ L1 ๋ฐ ํด๋ฒ(Huber) ์์คํจ์๋ฅผ ํ์ฉํ๋ค. ์ด ์ฐ๊ตฌ ์ฃผ์ ์ ๊ณตํ์ ๋ค์๊ณผ ๊ฐ๋ค. (a) ์ด์์น์ ๋ฏผ๊ฐํ์ง ์์ ๊ฐ๊ฑดํ์ฃผ๊ณก์ (robust principal curves)์ ์ ์ํ๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ์๋ฃ์ ๊ธฐํ์ ์ค์ฌ์ ์ ์ง๋๋ L1 ๋ฐ ํด๋ฒ ์์คํจ์์ ๋์๋๋ ์๋ก์ด ์ฃผ๊ณก์ ์ ์ ์ํ๋ค. (b) ์ด๋ก ์ ์ธ ์ธก๋ฉด์์, ๊ฐ๊ฑดํ์ฃผ๊ณก์ ์ ์ ์์ฑ์ ๊ท๋ช
ํ๋ค. (c) ๊ฐ๊ฑดํ์ฃผ๊ณก์ ์ ๊ตฌํํ๊ธฐ ์ํด ๊ณ์ฐ์ด ๋น ๋ฅธ ์ค์ฉ์ ์ธ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค.
์ธ ๋ฒ์งธ๋ก, ๊ธฐ์กด์ ์ฐจ์์ถ์๋ฐฉ๋ฒ ๋ฐ ๋ณธ ๋ฐฉ๋ฒ๋ก ์ ์ ๊ณตํ๋ R ํจํค์ง๋ฅผ ๊ตฌํํ์์ผ๋ฉฐ ์ด๋ฅผ ๋ค์ํ ์์ ๋ฐ ์ค๋ช
๊ณผ ํจ๊ป ์๊ฐํ๋ค. ๋ณธ ๋ฐฉ๋ฒ๋ก ์ ๊ฐ์ ์ ๋ค์์ฒด ์์์์ ๋ณต์กํ ์ต์ ํ ๋ฐฉ์ ์์ ํ์ง์๊ณ , ์ง๊ด์ ์ธ ๋ฐฉ์์ผ๋ก ๊ตฌํ ๊ฐ๋ฅํ๋ค๋ ์ ์ด๋ค. R ํจํค์ง๋ก ๊ตฌํ๋์ด ์ ๊ณต๋๋ค๋ ์ ์ด ์ด๋ฅผ ๋ฐฉ์ฆํ๋ฉฐ, ๋ณธ ํ์ ๋
ผ๋ฌธ์ ์ฐ๊ตฌ๋ฅผ ์ฌํ๊ฐ๋ฅํ๊ฒ ๋ง๋ ๋ค.
๋ง์ง๋ง์ผ๋ก, ๋ณด๋ค ๋ณต์กํ ๊ตฌ์กฐ๋ฅผ ๊ฐ์ง๋ ๋ค์์ฒด ์๋ฃ์ ๊ตฌ์กฐ๋ฅผ ์ถ์ ํ๊ธฐ์ํด, ๊ตญ์์ฃผ์ธก์ง์ ๋ถ์(local principal geodesics) ๋ฐฉ๋ฒ์ ์ฐ์ ์ ์ํ๋ค. ์ด ๋ฐฉ๋ฒ์ ์ค์ ์ง์งํ ์๋ฃ ๋ฐ ๋ค์ํ ๋ชจ์์คํ ์๋ฃ์ ์ ์ฉํ์ฌ ๊ทธ ํ์ฉ์ฑ์ ๋ณด์๋ค. ๋ค์์ผ๋ก, ์ถ์ ์น์ ๋ถ์ฐ์์ ํ ๋ฐ ์ด๋ก ์ ์ ๋นํ๋ฅผ ์ํ์ฌ Kรฉgl (1999), Kรฉgl et al., (2000) ๋ฐฉ๋ฒ์ ์ผ๋ฐ์ ์ธ ๋ฆฌ๋ง๋ค์์ฒด๋ก ํ์ฅํ๋ค. ๋ ๋์๊ฐ, ๋ฐฉ๋ฒ๋ก ์ ์ผ์น์ฑ, ์๋ ด์๋์ ๊ฐ์ ์ ๊ทผ์ ์ฑ์ง์ ๋น๋กฏํ์ฌ ๋น์ ๊ทผ์ ์ฑ์ง์ธ ์ง์ค๋ถ๋ฑ์(concentration inequality)์ ํต๊ณ์ ํ์ต์ด๋ก ์ ์ด์ฉํ์ฌ ๊ท๋ช
ํ๋ค.1 Introduction 1
2 Preliminaries 8
2.1 Principal curves 8
2.1 Riemannian manifolds and centrality on manifold 10
2.1 Principal curves on Riemannian manifolds 14
3 Spherical principal curves 15
3.1 Enhancement of principal circle for initialization 16
3.2 Proposed principal curves 25
3.3 Numerical experiments 34
3.4 Proofs 45
3.5 Concluding remarks 62
4 Robust spherical principal curves 64
4.1 The proposed robust principal curves 64
4.2 Stationarity of robust spherical principal curves 72
4.3 Numerical experiments 74
4.4 Summary and future work 80
5 spherepc: An R package for dimension reduction on a sphere 84
5.1 Existing methods 85
5.2 Spherical principal curves 91
5.3 Local principal geodesics 94
5.4 Application 99
5.5 Conclusion 101
6 Local principal curves on Riemannian manifolds 112
6.1 Preliminaries 116
6.2 Local principal geodesics 118
6.3 Local principal curves 125
6.4 Real data analysis 133
6.5 Further work 133
7 Conclusion 139
A. Appendix 141
A.1. Appendix for Chapter 3 141
A.2. Appendix for Chapter 4 145
A.3. Appendix for Chapter 6 152
Abstract in Korean 176
Acknowledgement in Korean 179๋ฐ
Nonparametric Uncertainty Quantification for Stochastic Gradient Flows
This paper presents a nonparametric statistical modeling method for
quantifying uncertainty in stochastic gradient systems with isotropic
diffusion. The central idea is to apply the diffusion maps algorithm to a
training data set to produce a stochastic matrix whose generator is a discrete
approximation to the backward Kolmogorov operator of the underlying dynamics.
The eigenvectors of this stochastic matrix, which we will refer to as the
diffusion coordinates, are discrete approximations to the eigenfunctions of the
Kolmogorov operator and form an orthonormal basis for functions defined on the
data set. Using this basis, we consider the projection of three uncertainty
quantification (UQ) problems (prediction, filtering, and response) into the
diffusion coordinates. In these coordinates, the nonlinear prediction and
response problems reduce to solving systems of infinite-dimensional linear
ordinary differential equations. Similarly, the continuous-time nonlinear
filtering problem reduces to solving a system of infinite-dimensional linear
stochastic differential equations. Solving the UQ problems then reduces to
solving the corresponding truncated linear systems in finitely many diffusion
coordinates. By solving these systems we give a model-free algorithm for UQ on
gradient flow systems with isotropic diffusion. We numerically verify these
algorithms on a 1-dimensional linear gradient flow system where the analytic
solutions of the UQ problems are known. We also apply the algorithm to a
chaotically forced nonlinear gradient flow system which is known to be well
approximated as a stochastically forced gradient flow.Comment: Find the associated videos at: http://personal.psu.edu/thb11
- โฆ