8 research outputs found

    On the Statistical Approximation of Conditional Expectation Operators

    Get PDF
    Diese Dissertation erörtert die datengetriebene Approximation des sogenannten conditional expectation operators, welcher den Erwartungswert einer reellwertigen Transformation einer Zufallsvariablen bedingt auf eine zweite Zufallsvariable beschreibt. Sie stellt dieses klassische numerische Problem in einem neuen theoretischen Zusammenhang dar und beleuchtet es mit verschiedenen ausgewählten Methoden der modernen statistischen Lerntheorie. Es werden sowohl ein bekannter parametrischer Projektionsansatz aus dem numerischen Bereich als auch ein nichtparametrisches Modell auf Basis eines reproducing kernel Hilbert space untersucht. Die Untersuchungen dieser Arbeit werden motiviert duch den speziellen Fall, in dem der conditional expectation operator die Übergangswahrscheinlichkeiten eines Markovprozesses beschreibt. In diesem Kontext sind die Spektraleigenschaften des resultierenden Markov transition operators von großem praktischen Interesse zur datenbasierten Untersuchung von komplexer Dynamik. Die oben genannten vorgestellten Schätzer werden in diesem Szenario in der Praxis verwendet. Es werden diverse neue Konvergenz- und Approximationsresultate sowohl für stochastisch unabhängige als auch abhängige Daten gezeigt. Als Werkzeuge für diese Ergebnisse dienen Konzepte aus den Theorien inverser Probleme, schwach abhängiger stochastischer Prozesse, der St ̈orung von Spektraleigenschaften und der Konzentration von Wahrscheinlichkeitsmaßen. Zur theoretischen Rechtfertigung des nichtparametrischen Modells wird das Schätzen von kernel autocovariance operators von stationären Zeitreihen untersucht. Diese Betrachtung kann zusätzlich vielfältig in anderen Zusammenhängen genutzt werden, was anhand von neuen Ergebnissen zur Konsistenz von kernelbasierter Hauptkomponentenanalyse mit schwach abhängigen Daten demonstriert wird. Diese Dissertation ist theoretischer Natur und dient nicht zur unmittelbaren Umsetzung von neuen numerischen Methoden. Sie stellt jedoch den direkten Zusammenhang von bekannten Ansätzen in diesem Feld zu relevanten statistischen Arbeiten der letzten Jahre her, welche sowohl stärkere theoretische Ergebnisse als auch effizientere praktische Schätzer für dieses Problem in der Zukunft möglich machen.This dissertation discusses the data-driven approximation of the so-called conditional expectation operator, which describes the expected value of a real-valued transformation of a random variable conditioned on a second random variable. It presents this classical numerical problem in a new theoretical context and examines it using various selected methods from modern statistical learning theory. Both a well-known parametric projection approach from the numerical domain and a nonparametric model based on a reproducing kernel Hilbert space are investigated. The investigations of this work are motivated by the special case in which the conditional expectation operator describes the transition probabilities of a Markov process. In this context, the spectral properties of the resulting Markov transition operator are of great practical interest for the data-based study of complex dynamics. The presented estimators are used in practice in this scenario. Various new convergence and approximation results are shown for both stochastically independent and dependent data. Concepts from the theories of inverse problems, weakly dependent stochastic processes, spectral perturbation, and concentration of measure serve as tools for these results. For the theoretical justification of the nonparametric model, the estimation of kernel autocovariance operators of stationary time series is investigated. This consideration can additionally be used in a variety of ways in other contexts, which is demonstrated in terms of new results on the consistency of kernel-based principal component analysis with weakly dependent data. This dissertation is theoretical in nature and does not serve to directly implement new numerical methods. It does, however, provide a direct link from known approaches in this field to relevant statistical work from recent years, which will make both stronger theoretical results and more efficient practical estimators for this problem possible in the future

    Nonparametric approximation of conditional expectation operators

    Get PDF
    Given the joint distribution of two random variables X,YX,Y on some second countable locally compact Hausdorff space, we investigate the statistical approximation of the L2L^2-operator defined by [Pf](x):=E[f(Y)X=x][Pf](x) := \mathbb{E}[ f(Y) \mid X = x ] under minimal assumptions. By modifying its domain, we prove that PP can be arbitrarily well approximated in operator norm by Hilbert--Schmidt operators acting on a reproducing kernel Hilbert space. This fact allows to estimate PP uniformly by finite-rank operators over a dense subspace even when PP is not compact. In terms of modes of convergence, we thereby obtain the superiority of kernel-based techniques over classically used parametric projection approaches such as Galerkin methods. This also provides a novel perspective on which limiting object the nonparametric estimate of PP converges to. As an application, we show that these results are particularly important for a large family of spectral analysis techniques for Markov transition operators. Our investigation also gives a new asymptotic perspective on the so-called kernel conditional mean embedding, which is the theoretical foundation of a wide variety of techniques in kernel-based nonparametric inference

    Kernel methods for detecting coherent structures in dynamical data

    Full text link
    We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space (RKHS) operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that it can be obtained by optimizing the variational approach for Markov processes (VAMP) score. As a result, we show that coherent sets of particle trajectories can be computed by kernel CCA. We demonstrate the efficiency of this approach with several examples, namely the well-known Bickley jet, ocean drifter data, and a molecular dynamics problem with a time-dependent potential. Finally, we propose a straightforward generalization of dynamic mode decomposition (DMD) called coherent mode decomposition (CMD). Our results provide a generic machine learning approach to the computation of coherent sets with an objective score that can be used for cross-validation and the comparison of different methods

    Optimal Rates for Regularized Conditional Mean Embedding Learning

    Full text link
    We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of YY given XX into a target reproducing kernel Hilbert space HY\mathcal{H}_Y. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between HX\mathcal{H}_X and L2L_2, to HY\mathcal{H}_Y. This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal O(logn/n)O(\log n / n) rates without assuming HY\mathcal{H}_Y to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal

    Kernel autocovariance operators of stationary processes: Estimation and convergence

    Get PDF
    We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and prove several asymptotic results as well as finite sample error bounds with a detailed analysis for the Gaussian kernel. We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities. Finally, we use our approach to examine the nonparametric estimation of Markov transition operators and highlight how our theory can give a consistency analysis for a large family of spectral analysis methods including kernel-based dynamic mode decomposition

    Optimal Reaction Coordinates: Variational Characterization and Sparse Computation

    Get PDF
    Reaction Coordinates (RCs) are indicators of hidden, low-dimensional mechanisms that govern the long-term behavior of high-dimensional stochastic processes. We present a novel and general variational characterization of optimal RCs and provide conditions for their existence. Optimal RCs are minimizers of a certain loss function and reduced models based on them guarantee very good approximation of the long-term dynamics of the original highdimensional process. We show that, for slow-fast systems, metastable systems, and other systems with known good RCs, the novel theory reproduces previous insight. Remarkably, the numerical e�ort required to evaluate the loss function scales only with the complexity of the underlying, low-dimensional mechanism, and not with that of the full system. The theory provided lays the foundation for an e�cient and data-sparse computation of RCs via modern machine learning techniques

    Variational Characterization and Identification of Reaction Coordinates in Stochastic Systems

    Get PDF
    Reaction coordinates are indicators of hidden, low-dimensional mechanisms that govern the long-term behavior of high-dimensional stochastic systems. We present a novel, very general characterization of these coordinates and provide conditions for their existence. We show that these conditions are ful�lled for slow-fast systems, metastable systems, and other systems with known good reaction coordinates. Further, we formulate these conditions as a variational principle, i.e., de�ne a loss function whose minimizers are optimal reaction coordinates. Remarkably, the numerical e�ort required to evaluate the loss function scales only with the complexity of the underlying, low-dimensional mechanism, and not with that of the full system. In summary, we provide the theoretical foundation for an e�cient computation of reaction coordinates via modern machine learning techniques
    corecore