8 research outputs found
On the Statistical Approximation of Conditional Expectation Operators
Diese Dissertation erörtert die datengetriebene Approximation des sogenannten conditional expectation operators, welcher den Erwartungswert einer reellwertigen Transformation einer Zufallsvariablen bedingt auf eine zweite Zufallsvariable beschreibt. Sie stellt dieses klassische numerische Problem in einem neuen theoretischen Zusammenhang dar und beleuchtet es mit verschiedenen ausgewählten Methoden der modernen statistischen Lerntheorie. Es werden sowohl ein bekannter parametrischer Projektionsansatz aus dem numerischen Bereich als auch ein nichtparametrisches Modell auf Basis eines reproducing kernel Hilbert space untersucht.
Die Untersuchungen dieser Arbeit werden motiviert duch den speziellen Fall, in dem der conditional expectation operator die Übergangswahrscheinlichkeiten eines Markovprozesses beschreibt. In diesem Kontext sind die Spektraleigenschaften des resultierenden Markov transition operators von großem praktischen Interesse zur datenbasierten Untersuchung von komplexer Dynamik. Die oben genannten vorgestellten Schätzer werden in diesem Szenario in der Praxis verwendet.
Es werden diverse neue Konvergenz- und Approximationsresultate sowohl für stochastisch unabhängige als auch abhängige Daten gezeigt. Als Werkzeuge für diese Ergebnisse dienen Konzepte aus den Theorien inverser Probleme, schwach abhängiger stochastischer Prozesse, der St ̈orung von Spektraleigenschaften und der Konzentration von Wahrscheinlichkeitsmaßen. Zur theoretischen Rechtfertigung des nichtparametrischen Modells wird das Schätzen von kernel autocovariance operators von stationären Zeitreihen untersucht. Diese Betrachtung kann zusätzlich vielfältig in anderen Zusammenhängen genutzt werden, was anhand von neuen Ergebnissen zur Konsistenz von kernelbasierter Hauptkomponentenanalyse mit schwach abhängigen Daten demonstriert wird.
Diese Dissertation ist theoretischer Natur und dient nicht zur unmittelbaren Umsetzung von neuen numerischen Methoden. Sie stellt jedoch den direkten Zusammenhang von bekannten Ansätzen in diesem Feld zu relevanten statistischen Arbeiten der letzten Jahre her, welche sowohl stärkere theoretische Ergebnisse als auch effizientere praktische Schätzer für dieses Problem in der Zukunft möglich machen.This dissertation discusses the data-driven approximation of the so-called conditional expectation operator, which describes the expected value of a real-valued transformation of a random variable conditioned on a second random variable. It presents this classical numerical problem in a new theoretical context and examines it using various selected methods from modern statistical learning theory. Both a well-known parametric projection approach from the numerical domain and a nonparametric model based on a reproducing kernel Hilbert space are investigated.
The investigations of this work are motivated by the special case in which the conditional expectation operator describes the transition probabilities of a Markov process. In this context, the spectral properties of the resulting Markov transition operator are of great practical interest for the data-based study of complex dynamics. The presented estimators are used in practice in this scenario.
Various new convergence and approximation results are shown for both stochastically independent and dependent data. Concepts from the theories of inverse problems, weakly dependent stochastic processes, spectral perturbation, and concentration of measure serve as tools for these results. For the theoretical justification of the nonparametric model, the estimation of kernel autocovariance operators of stationary time series is investigated. This consideration can additionally be used in a variety of ways in other contexts, which is demonstrated in terms of new results on the consistency of kernel-based principal component analysis with weakly dependent data.
This dissertation is theoretical in nature and does not serve to directly implement new numerical methods. It does, however, provide a direct link from known approaches in this field to relevant statistical work from recent years, which will make both stronger theoretical results and more efficient practical estimators for this problem possible in the future
Nonparametric approximation of conditional expectation operators
Given the joint distribution of two random variables on some second
countable locally compact Hausdorff space, we investigate the statistical
approximation of the -operator defined by under minimal assumptions. By modifying its domain, we prove that
can be arbitrarily well approximated in operator norm by Hilbert--Schmidt
operators acting on a reproducing kernel Hilbert space. This fact allows to
estimate uniformly by finite-rank operators over a dense subspace even when
is not compact. In terms of modes of convergence, we thereby obtain the
superiority of kernel-based techniques over classically used parametric
projection approaches such as Galerkin methods. This also provides a novel
perspective on which limiting object the nonparametric estimate of
converges to. As an application, we show that these results are particularly
important for a large family of spectral analysis techniques for Markov
transition operators. Our investigation also gives a new asymptotic perspective
on the so-called kernel conditional mean embedding, which is the theoretical
foundation of a wide variety of techniques in kernel-based nonparametric
inference
Kernel methods for detecting coherent structures in dynamical data
We illustrate relationships between classical kernel-based dimensionality
reduction techniques and eigendecompositions of empirical estimates of
reproducing kernel Hilbert space (RKHS) operators associated with dynamical
systems. In particular, we show that kernel canonical correlation analysis
(CCA) can be interpreted in terms of kernel transfer operators and that it can
be obtained by optimizing the variational approach for Markov processes (VAMP)
score. As a result, we show that coherent sets of particle trajectories can be
computed by kernel CCA. We demonstrate the efficiency of this approach with
several examples, namely the well-known Bickley jet, ocean drifter data, and a
molecular dynamics problem with a time-dependent potential. Finally, we propose
a straightforward generalization of dynamic mode decomposition (DMD) called
coherent mode decomposition (CMD). Our results provide a generic machine
learning approach to the computation of coherent sets with an objective score
that can be used for cross-validation and the comparison of different methods
Optimal Rates for Regularized Conditional Mean Embedding Learning
We address the consistency of a kernel ridge regression estimate of the
conditional mean embedding (CME), which is an embedding of the conditional
distribution of given into a target reproducing kernel Hilbert space
. The CME allows us to take conditional expectations of target
RKHS functions, and has been employed in nonparametric causal and Bayesian
inference. We address the misspecified setting, where the target CME is in the
space of Hilbert-Schmidt operators acting from an input interpolation space
between and , to . This space of operators
is shown to be isomorphic to a newly defined vector-valued interpolation space.
Using this isomorphism, we derive a novel and adaptive statistical learning
rate for the empirical CME estimator under the misspecified setting. Our
analysis reveals that our rates match the optimal rates without
assuming to be finite dimensional. We further establish a lower
bound on the learning rate, which shows that the obtained upper bound is
optimal
Kernel autocovariance operators of stationary processes: Estimation and convergence
We consider autocovariance operators of a stationary stochastic process on a
Polish space that is embedded into a reproducing kernel Hilbert space. We
investigate how empirical estimates of these operators converge along
realizations of the process under various conditions. In particular, we examine
ergodic and strongly mixing processes and prove several asymptotic results as
well as finite sample error bounds with a detailed analysis for the Gaussian
kernel. We provide applications of our theory in terms of consistency results
for kernel PCA with dependent data and the conditional mean embedding of
transition probabilities. Finally, we use our approach to examine the
nonparametric estimation of Markov transition operators and highlight how our
theory can give a consistency analysis for a large family of spectral analysis
methods including kernel-based dynamic mode decomposition
Optimal Reaction Coordinates: Variational Characterization and Sparse Computation
Reaction Coordinates (RCs) are indicators of hidden, low-dimensional mechanisms that
govern the long-term behavior of high-dimensional stochastic processes. We present a novel
and general variational characterization of optimal RCs and provide conditions for their existence.
Optimal RCs are minimizers of a certain loss function and reduced models based
on them guarantee very good approximation of the long-term dynamics of the original highdimensional
process. We show that, for slow-fast systems, metastable systems, and other
systems with known good RCs, the novel theory reproduces previous insight. Remarkably,
the numerical e�ort required to evaluate the loss function scales only with the complexity of
the underlying, low-dimensional mechanism, and not with that of the full system. The theory
provided lays the foundation for an e�cient and data-sparse computation of RCs via modern
machine learning techniques
Variational Characterization and Identification of Reaction Coordinates in Stochastic Systems
Reaction coordinates are indicators of hidden, low-dimensional mechanisms that govern
the long-term behavior of high-dimensional stochastic systems. We present a novel, very
general characterization of these coordinates and provide conditions for their existence. We
show that these conditions are ful�lled for slow-fast systems, metastable systems, and other
systems with known good reaction coordinates. Further, we formulate these conditions as
a variational principle, i.e., de�ne a loss function whose minimizers are optimal reaction
coordinates. Remarkably, the numerical e�ort required to evaluate the loss function scales
only with the complexity of the underlying, low-dimensional mechanism, and not with that of
the full system. In summary, we provide the theoretical foundation for an e�cient computation
of reaction coordinates via modern machine learning techniques