19,622 research outputs found
Learning gradients on manifolds
A common belief in high-dimensional data analysis is that data are
concentrated on a low-dimensional manifold. This motivates simultaneous
dimension reduction and regression on manifolds. We provide an algorithm for
learning gradients on manifolds for dimension reduction for high-dimensional
data with few observations. We obtain generalization error bounds for the
gradient estimates and show that the convergence rate depends on the intrinsic
dimension of the manifold and not on the dimension of the ambient space. We
illustrate the efficacy of this approach empirically on simulated and real data
and compare the method to other dimension reduction procedures.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ206 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Efficient Rank Reduction of Correlation Matrices
Geometric optimisation algorithms are developed that efficiently find the
nearest low-rank correlation matrix. We show, in numerical tests, that our
methods compare favourably to the existing methods in the literature. The
connection with the Lagrange multiplier method is established, along with an
identification of whether a local minimum is a global minimum. An additional
benefit of the geometric approach is that any weighted norm can be applied. The
problem of finding the nearest low-rank correlation matrix occurs as part of
the calibration of multi-factor interest rate market models to correlation.Comment: First version: 20 pages, 4 figures Second version [changed content]:
21 pages, 6 figure
Data-driven Efficient Solvers and Predictions of Conformational Transitions for Langevin Dynamics on Manifold in High Dimensions
We work on dynamic problems with collected data that
distributed on a manifold . Through the
diffusion map, we first learn the reaction coordinates where is a manifold isometrically embedded into an
Euclidean space for . The reaction coordinates
enable us to obtain an efficient approximation for the dynamics described by a
Fokker-Planck equation on the manifold . By using the reaction
coordinates, we propose an implementable, unconditionally stable, data-driven
upwind scheme which automatically incorporates the manifold structure of
. Furthermore, we provide a weighted convergence analysis of
the upwind scheme to the Fokker-Planck equation. The proposed upwind scheme
leads to a Markov chain with transition probability between the nearest
neighbor points. We can benefit from such property to directly conduct
manifold-related computations such as finding the optimal coarse-grained
network and the minimal energy path that represents chemical reactions or
conformational changes. To establish the Fokker-Planck equation, we need to
acquire information about the equilibrium potential of the physical system on
. Hence, we apply a Gaussian Process regression algorithm to
generate equilibrium potential for a new physical system with new parameters.
Combining with the proposed upwind scheme, we can calculate the trajectory of
the Fokker-Planck equation on based on the generated equilibrium
potential. Finally, we develop an algorithm to pullback the trajectory to the
original high dimensional space as a generative data for the new physical
system.Comment: 59 pages, 16 figure
Optimal projection of observations in a Bayesian setting
Optimal dimensionality reduction methods are proposed for the Bayesian
inference of a Gaussian linear model with additive noise in presence of
overabundant data. Three different optimal projections of the observations are
proposed based on information theory: the projection that minimizes the
Kullback-Leibler divergence between the posterior distributions of the original
and the projected models, the one that minimizes the expected Kullback-Leibler
divergence between the same distributions, and the one that maximizes the
mutual information between the parameter of interest and the projected
observations. The first two optimization problems are formulated as the
determination of an optimal subspace and therefore the solution is computed
using Riemannian optimization algorithms on the Grassmann manifold. Regarding
the maximization of the mutual information, it is shown that there exists an
optimal subspace that minimizes the entropy of the posterior distribution of
the reduced model; a basis of the subspace can be computed as the solution to a
generalized eigenvalue problem; an a priori error estimate on the mutual
information is available for this particular solution; and that the
dimensionality of the subspace to exactly conserve the mutual information
between the input and the output of the models is less than the number of
parameters to be inferred. Numerical applications to linear and nonlinear
models are used to assess the efficiency of the proposed approaches, and to
highlight their advantages compared to standard approaches based on the
principal component analysis of the observations
- …