19,622 research outputs found

    Learning gradients on manifolds

    Full text link
    A common belief in high-dimensional data analysis is that data are concentrated on a low-dimensional manifold. This motivates simultaneous dimension reduction and regression on manifolds. We provide an algorithm for learning gradients on manifolds for dimension reduction for high-dimensional data with few observations. We obtain generalization error bounds for the gradient estimates and show that the convergence rate depends on the intrinsic dimension of the manifold and not on the dimension of the ambient space. We illustrate the efficacy of this approach empirically on simulated and real data and compare the method to other dimension reduction procedures.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ206 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Efficient Rank Reduction of Correlation Matrices

    Get PDF
    Geometric optimisation algorithms are developed that efficiently find the nearest low-rank correlation matrix. We show, in numerical tests, that our methods compare favourably to the existing methods in the literature. The connection with the Lagrange multiplier method is established, along with an identification of whether a local minimum is a global minimum. An additional benefit of the geometric approach is that any weighted norm can be applied. The problem of finding the nearest low-rank correlation matrix occurs as part of the calibration of multi-factor interest rate market models to correlation.Comment: First version: 20 pages, 4 figures Second version [changed content]: 21 pages, 6 figure

    Data-driven Efficient Solvers and Predictions of Conformational Transitions for Langevin Dynamics on Manifold in High Dimensions

    Full text link
    We work on dynamic problems with collected data {xi}\{\mathsf{x}_i\} that distributed on a manifold MRp\mathcal{M}\subset\mathbb{R}^p. Through the diffusion map, we first learn the reaction coordinates {yi}N\{\mathsf{y}_i\}\subset \mathcal{N} where N\mathcal{N} is a manifold isometrically embedded into an Euclidean space R\mathbb{R}^\ell for p\ell \ll p. The reaction coordinates enable us to obtain an efficient approximation for the dynamics described by a Fokker-Planck equation on the manifold N\mathcal{N}. By using the reaction coordinates, we propose an implementable, unconditionally stable, data-driven upwind scheme which automatically incorporates the manifold structure of N\mathcal{N}. Furthermore, we provide a weighted L2L^2 convergence analysis of the upwind scheme to the Fokker-Planck equation. The proposed upwind scheme leads to a Markov chain with transition probability between the nearest neighbor points. We can benefit from such property to directly conduct manifold-related computations such as finding the optimal coarse-grained network and the minimal energy path that represents chemical reactions or conformational changes. To establish the Fokker-Planck equation, we need to acquire information about the equilibrium potential of the physical system on N\mathcal{N}. Hence, we apply a Gaussian Process regression algorithm to generate equilibrium potential for a new physical system with new parameters. Combining with the proposed upwind scheme, we can calculate the trajectory of the Fokker-Planck equation on N\mathcal{N} based on the generated equilibrium potential. Finally, we develop an algorithm to pullback the trajectory to the original high dimensional space as a generative data for the new physical system.Comment: 59 pages, 16 figure

    Optimal projection of observations in a Bayesian setting

    Full text link
    Optimal dimensionality reduction methods are proposed for the Bayesian inference of a Gaussian linear model with additive noise in presence of overabundant data. Three different optimal projections of the observations are proposed based on information theory: the projection that minimizes the Kullback-Leibler divergence between the posterior distributions of the original and the projected models, the one that minimizes the expected Kullback-Leibler divergence between the same distributions, and the one that maximizes the mutual information between the parameter of interest and the projected observations. The first two optimization problems are formulated as the determination of an optimal subspace and therefore the solution is computed using Riemannian optimization algorithms on the Grassmann manifold. Regarding the maximization of the mutual information, it is shown that there exists an optimal subspace that minimizes the entropy of the posterior distribution of the reduced model; a basis of the subspace can be computed as the solution to a generalized eigenvalue problem; an a priori error estimate on the mutual information is available for this particular solution; and that the dimensionality of the subspace to exactly conserve the mutual information between the input and the output of the models is less than the number of parameters to be inferred. Numerical applications to linear and nonlinear models are used to assess the efficiency of the proposed approaches, and to highlight their advantages compared to standard approaches based on the principal component analysis of the observations
    corecore