3,449 research outputs found
Principal Boundary on Riemannian Manifolds
We consider the classification problem and focus on nonlinear methods for
classification on manifolds. For multivariate datasets lying on an embedded
nonlinear Riemannian manifold within the higher-dimensional ambient space, we
aim to acquire a classification boundary for the classes with labels, using the
intrinsic metric on the manifolds. Motivated by finding an optimal boundary
between the two classes, we invent a novel approach -- the principal boundary.
From the perspective of classification, the principal boundary is defined as an
optimal curve that moves in between the principal flows traced out from two
classes of data, and at any point on the boundary, it maximizes the margin
between the two classes. We estimate the boundary in quality with its
direction, supervised by the two principal flows. We show that the principal
boundary yields the usual decision boundary found by the support vector machine
in the sense that locally, the two boundaries coincide. Some optimality and
convergence properties of the random principal boundary and its population
counterpart are also shown. We illustrate how to find, use and interpret the
principal boundary with an application in real data.Comment: 31 pages,10 figure
Hypothesis Testing For Network Data in Functional Neuroimaging
In recent years, it has become common practice in neuroscience to use
networks to summarize relational information in a set of measurements,
typically assumed to be reflective of either functional or structural
relationships between regions of interest in the brain. One of the most basic
tasks of interest in the analysis of such data is the testing of hypotheses, in
answer to questions such as "Is there a difference between the networks of
these two groups of subjects?" In the classical setting, where the unit of
interest is a scalar or a vector, such questions are answered through the use
of familiar two-sample testing strategies. Networks, however, are not Euclidean
objects, and hence classical methods do not directly apply. We address this
challenge by drawing on concepts and techniques from geometry, and
high-dimensional statistical inference. Our work is based on a precise
geometric characterization of the space of graph Laplacian matrices and a
nonparametric notion of averaging due to Fr\'echet. We motivate and illustrate
our resulting methodologies for testing in the context of networks derived from
functional neuroimaging data on human subjects from the 1000 Functional
Connectomes Project. In particular, we show that this global test is more
statistical powerful, than a mass-univariate approach. In addition, we have
also provided a method for visualizing the individual contribution of each edge
to the overall test statistic.Comment: 34 pages. 5 figure
An Infinitesimal Probabilistic Model for Principal Component Analysis of Manifold Valued Data
We provide a probabilistic and infinitesimal view of how the principal
component analysis procedure (PCA) can be generalized to analysis of nonlinear
manifold valued data. Starting with the probabilistic PCA interpretation of the
Euclidean PCA procedure, we show how PCA can be generalized to manifolds in an
intrinsic way that does not resort to linearization of the data space. The
underlying probability model is constructed by mapping a Euclidean stochastic
process to the manifold using stochastic development of Euclidean
semimartingales. The construction uses a connection and bundles of covariant
tensors to allow global transport of principal eigenvectors, and the model is
thereby an example of how principal fiber bundles can be used to handle the
lack of global coordinate system and orientations that characterizes manifold
valued statistics. We show how curvature implies non-integrability of the
equivalent of Euclidean principal subspaces, and how the stochastic flows
provide an alternative to explicit construction of such subspaces. We describe
estimation procedures for inference of parameters and prediction of principal
components, and we give examples of properties of the model on embedded
surfaces
Shape Dimension and Intrinsic Metric from Samples of Manifolds
We introduce the adaptive neighborhood graph as a data structure for modeling a smooth manifold M embedded in some Euclidean space Rd. We assume that M is known to us only through a finite sample P \subset M, as is often the case in applications. The adaptive neighborhood graph is a geometric graph on P. Its complexity is at most \min{2^{O(k)n, n2}, where n = |P| and k = dim M, as opposed to the n\lceil d/2 \rceil complexity of the Delaunay triangulation, which is often used to model manifolds. We prove that we can correctly infer the connected components and the dimension of M from the adaptive neighborhood graph provided a certain standard sampling condition is fulfilled. The running time of the dimension detection algorithm is d2O(k^{7} log k) for each connected component of M. If the dimension is considered constant, this is a constant-time operation, and the adaptive neighborhood graph is of linear size. Moreover, the exponential dependence of the constants is only on the intrinsic dimension k, not on the ambient dimension d. This is of particular interest if the co-dimension is high, i.e., if k is much smaller than d, as is the case in many applications. The adaptive neighborhood graph also allows us to approximate the geodesic distances between the points in
Barycentric Subspace Analysis on Manifolds
This paper investigates the generalization of Principal Component Analysis
(PCA) to Riemannian manifolds. We first propose a new and general type of
family of subspaces in manifolds that we call barycentric subspaces. They are
implicitly defined as the locus of points which are weighted means of
reference points. As this definition relies on points and not on tangent
vectors, it can also be extended to geodesic spaces which are not Riemannian.
For instance, in stratified spaces, it naturally allows principal subspaces
that span several strata, which is impossible in previous generalizations of
PCA. We show that barycentric subspaces locally define a submanifold of
dimension k which generalizes geodesic subspaces.Second, we rephrase PCA in
Euclidean spaces as an optimization on flags of linear subspaces (a hierarchy
of properly embedded linear subspaces of increasing dimension). We show that
the Euclidean PCA minimizes the Accumulated Unexplained Variances by all the
subspaces of the flag (AUV). Barycentric subspaces are naturally nested,
allowing the construction of hierarchically nested subspaces. Optimizing the
AUV criterion to optimally approximate data points with flags of affine spans
in Riemannian manifolds lead to a particularly appealing generalization of PCA
on manifolds called Barycentric Subspaces Analysis (BSA).Comment: Annals of Statistics, Institute of Mathematical Statistics, A
Para\^itr
- …