113 research outputs found
Kernel methods for detecting coherent structures in dynamical data
We illustrate relationships between classical kernel-based dimensionality
reduction techniques and eigendecompositions of empirical estimates of
reproducing kernel Hilbert space (RKHS) operators associated with dynamical
systems. In particular, we show that kernel canonical correlation analysis
(CCA) can be interpreted in terms of kernel transfer operators and that it can
be obtained by optimizing the variational approach for Markov processes (VAMP)
score. As a result, we show that coherent sets of particle trajectories can be
computed by kernel CCA. We demonstrate the efficiency of this approach with
several examples, namely the well-known Bickley jet, ocean drifter data, and a
molecular dynamics problem with a time-dependent potential. Finally, we propose
a straightforward generalization of dynamic mode decomposition (DMD) called
coherent mode decomposition (CMD). Our results provide a generic machine
learning approach to the computation of coherent sets with an objective score
that can be used for cross-validation and the comparison of different methods
Spectral Dimensionality Reduction
In this paper, we study and put under a common framework a number of non-linear dimensionality reduction methods, such as Locally Linear Embedding, Isomap, Laplacian Eigenmaps and kernel PCA, which are based on performing an eigen-decomposition (hence the name 'spectral'). That framework also includes classical methods such as PCA and metric multidimensional scaling (MDS). It also includes the data transformation step used in spectral clustering. We show that in all of these cases the learning algorithm estimates the principal eigenfunctions of an operator that depends on the unknown data density and on a kernel that is not necessarily positive semi-definite. This helps to generalize some of these algorithms so as to predict an embedding for out-of-sample examples without having to retrain the model. It also makes it more transparent what these algorithm are minimizing on the empirical data and gives a corresponding notion of generalization error. Dans cet article, nous étudions et développons un cadre unifié pour un certain nombre de méthodes non linéaires de réduction de dimensionalité, telles que LLE, Isomap, LE (Laplacian Eigenmap) et ACP à noyaux, qui font de la décomposition en valeurs propres (d'où le nom "spectral"). Ce cadre inclut également des méthodes classiques telles que l'ACP et l'échelonnage multidimensionnel métrique (MDS). Il inclut aussi l'étape de transformation de données utilisée dans l'agrégation spectrale. Nous montrons que, dans tous les cas, l'algorithme d'apprentissage estime les fonctions propres principales d'un opérateur qui dépend de la densité inconnue de données et d'un noyau qui n'est pas nécessairement positif semi-défini. Ce cadre aide à généraliser certains modèles pour prédire les coordonnées des exemples hors-échantillons sans avoir à réentraîner le modèle. Il aide également à rendre plus transparent ce que ces algorithmes minimisent sur les données empiriques et donne une notion correspondante d'erreur de généralisation.non-parametric models, non-linear dimensionality reduction, kernel models, modèles non paramétriques, réduction de dimensionalité non linéaire, modèles à noyau
Model Reduction and Neural Networks for Parametric PDEs
We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practice, is robust to the dimension of finite-dimensional approximations of these spaces required for computation. For a class of input-output maps, and suitably chosen probability measures on the inputs, we prove convergence of the proposed approximation methodology. Numerically we demonstrate the effectiveness of the method on a class of parametric elliptic PDE problems, showing convergence and robustness of the approximation scheme with respect to the size of the discretization, and compare our method with existing algorithms from the literature
The Sample Complexity of Dictionary Learning
A large set of signals can sometimes be described sparsely using a
dictionary, that is, every element can be represented as a linear combination
of few elements from the dictionary. Algorithms for various signal processing
applications, including classification, denoising and signal separation, learn
a dictionary from a set of signals to be represented. Can we expect that the
representation found by such a dictionary for a previously unseen example from
the same source will have L_2 error of the same magnitude as those for the
given examples? We assume signals are generated from a fixed distribution, and
study this questions from a statistical learning theory perspective.
We develop generalization bounds on the quality of the learned dictionary for
two types of constraints on the coefficient selection, as measured by the
expected L_2 error in representation when the dictionary is used. For the case
of l_1 regularized coefficient selection we provide a generalization bound of
the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the
number of elements in the dictionary, lambda is a bound on the l_1 norm of the
coefficient vector and m is the number of samples, which complements existing
results. For the case of representing a new signal as a combination of at most
k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m))
under an assumption on the level of orthogonality of the dictionary (low Babel
function). We further show that this assumption holds for most dictionaries in
high dimensions in a strong probabilistic sense. Our results further yield fast
rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher
complexity. We provide similar results in a general setting using kernels with
weak smoothness requirements
On the Sample Complexity of Subspace Learning
A large number of algorithms in machine learning, from principal component
analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral
embedding and support estimation methods, rely on estimating a linear subspace
from samples. In this paper we introduce a general formulation of this problem
and derive novel learning error estimates. Our results rely on natural
assumptions on the spectral properties of the covariance operator associated to
the data distribu- tion, and hold for a wide class of metrics between
subspaces. As special cases, we discuss sharp error estimates for the
reconstruction properties of PCA and spectral support estimation. Key to our
analysis is an operator theoretic approach that has broad applicability to
spectral learning methods.Comment: Extendend Version of conference pape
- …