12 research outputs found

    Glosarium Matematika

    Get PDF
    273 p.; 24 cm

    Stability and inference in discrete diffusion scale-spaces

    Get PDF
    Taking averages of observations is the most basic method to make inferences in the presence of uncertainty. In late 1980's, this simple idea has been extended to the principle of successively average less where the change is faster, and applied to the problem of revealing a signal with jump discontinuities in additive noise. Successive averaging results in a family of signals with progressively decreasing amount of details, which is called the scale-space and further conveniently formalized by viewing it as a solution to a certain diffusion-inspired evolutionary partial differential equation (PDE). Such a model is known as the diffusion scale-space and it possesses two long-standing problems: (i) model analysis which aims at establishing stability and guarantees that averaging does not distort important information, and (ii) model selection, such as identification of the optimal scale (diffusion stopping time) given an initial noisy signal and an incomplete model. This thesis studies both problems in the discrete space and time. Such a setting has been strongly advocated by Lindeberg [1991] and Weickert [1996] among others. The focus of the model analysis part is on necessary and sufficient conditions which guarantee that a discrete diffusion possesses the scale-space property in the sense of sign variation diminishing. Connections with the total variation diminishing and the open problem in a multivariate case are discussed too. Considering the model selection, the thesis unifies two optimal diffusion stopping principles: (i) the time when the Shannon entropy-based Liapunov function of Sporring and Weickert [1999] reaches its steady state, and (ii) the time when the diffusion outcome has the least correlation with the noise estimate, contributed by Mrázek and Navara [2003]. Both ideas are shown to be particular cases of the marginal likelihood inference. Moreover, the suggested formalism provides first principles behind such criteria, and removes a variety of inconsistencies. It is suggested that the outcome of the diffusion should be interpreted as a certain expectation conditioned on the initial signal of observations instead of being treated as a random sample or probabilities. This removes the need to normalize signals in the approach of Sporring and Weickert [1999], and it also better justifies application of the correlation criterion of Mrázek and Navara [2003]. Throughout this work, the emphasis is given on methods that enable to reduce the problem to that of establishing the positivity of a quadratic form. The necessary and sufficient conditions can then be approached via positivity of matrix minors. A supplementary appendix is provided which summarizes a novel method of evaluating matrix minors. Intuitive examples of difficulties with statistical inference conclude the thesis.reviewe

    Glosarium Matematika

    Get PDF

    Adaptive Multiple Shooting for Boundary Value Problems and Constrained Parabolic Optimization Problems

    Get PDF
    Subject of this thesis is the development of adaptive techniques for multiple shooting methods. The focus is on the application to optimal control problems governed by parabolic partial differential equations. In order to retain as much freedom as possible in the later choice of discretization schemes, the details of both direct and indirect multiple shooting variants are worked out on an abstract function space level. Therefore, shooting techniques do not constitute a way of discretizing a problem. A thorough examination of the connections between the approaches provides an overview of different shooting formulations and enables their comparison for both linear and nonlinear problems. We extend current research by considering additional constraints on the control variable in the multiple shooting context. An optimization problem is developed which includes so-called box constraints in the multiple shooting context. Several modern algorithms treating control constraints are adapted to the requirements of shooting methods. The modified algorithms permit an extended comparison of the different shooting approaches. The efficiency of numerical methods can often be increased by developing grid adaptation techniques. While adaptive discretization schemes can be readily transferred to the multiple shooting context, questions of conditioning and stability make it difficult to develop adaptive features for shooting point distribution in multiple shooting processes. We concentrate on the design and comparison of two different approaches to shooting grid adaptation in the framework of ordinary differential equations. A residual-based adaptive algorithm is transferred to parabolic optimization problems with control constraints. The presented concepts and methods are verified by means of several examples, whereby theoretical results are numerically confirmed. We choose the test problems so that the simple shooting method becomes unstable and therefore a genuine multiple shooting technique is required

    New Directions for Contact Integrators

    Get PDF
    Contact integrators are a family of geometric numerical schemes which guarantee the conservation of the contact structure. In this work we review the construction of both the variational and Hamiltonian versions of these methods. We illustrate some of the advantages of geometric integration in the dissipative setting by focusing on models inspired by recent studies in celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282

    Probabilistic Approaches to Stochastic Optimization

    Get PDF
    Optimization is a cardinal concept in the sciences, and viable algorithms of utmost importance as tools for finding the solution to an optimization problem. Empirical risk minimization is a major workhorse, in particular in machine learning applications, where an input-target relation is learned in a supervised manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradient-based, and possibly stochastic optimization routines, such as stochastic gradient descent. Though popular, and practically successful, this setup has major downsides which often makes it finicky to work with, or at least the bottleneck in a larger chain of learning procedures. For instance, typical issues are: • Overfitting of a parametrized model to the data. This generally leads to poor generalization performance on unseen data. • Tuning of algorithmic parameters, such as learning rates, is tedious, inefficient, and costly. • Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only yield incomplete, or corrupted information about the empirical risk, and are thus difficult to handle from a decision making point of view. This thesis consist of four conceptual parts. In the first one, we argue that conditional distributions of local full and mini-batch evaluations of losses and gradients can be well approximated by Gaussian distributions, since the losses themselves are sums of independently and identically distributed random variables. We then provide a way of estimating the corresponding sufficient statistics, i. e., variances and means, with low computational overhead. This yields an analytic likelihood for the loss and gradient at every point of the inputs space, which subsequently can be incorporated into active decision making at run-time of the optimizer. The second part focuses on estimating generalization performance, not by monitoring a validation loss, but by assessing if stochastic gradients can be fully explained by noise that occurs due to the finiteness of the training dataset, and not due to an informative gradient direction of the expected loss (risk). This yields a criterion for early-stopping where no validation set is needed, and the full dataset can be used for training. The third part is concerned with fully automated learning rate adaption for stochastic gradient descent (SGD). Global learning rates are arguably the most exposed manual tuning parameters of stochastic optimization routines. We propose a cheap and self-contained sub-routine, called a ‘probabilistic line search’ that automatically adapts the learning rate in every step, based on a local probability of descent. The result is an entirely parameter-free, stochastic optimizer that reaches comparable or better generalization performances than SGD with a carefully hand-tuned learning rate on the tested problems. The last part deals with noise-robust search directions. Inspired by classic first- and second-order methods, we model the unknown dynamics of the gradient or Hessian-function on the optimization path. The approach has strong connections to classic filtering frameworks and can incorporate noise-corrupted evaluations of the gradient at successive locations. The benefits are twofold. Firstly, we gain valuable insight on less accessible or ad-hoc design choices of classic optimizer as special cases. Secondly, we provide the basis for a flexible, self-contained, and easy-to-use class of stochastic optimizers that exhibit a higher degree of robustness and automation.Optimierung ist ein grundlegendes Prinzip in denWissenschaften, und Algorithmen zu deren Lösung von großer praktischer Bedeutung. Empirische Risikominimierung ist ein gängiges Modell, vor allem in Anwendungen des Maschinellen Lernens, in denen eine Eingabe-Ausgabe Relation überwacht gelernt wird. Empirische Risiken mit hoch-dimensionalen Eingaben werden meist durch gierige, gradientenbasierte, und möglicherweise stochastische Routinen optimiert, so wie beispielsweise der stochastische Gradientenabstieg. Obwohl dieses Konzept populär als auch erfolgreich in der Praxis ist, hat es doch beträchtliche Nachteile, die es entweder aufwendig machen damit zu arbeiten, oder verlangsamen, sodass es den Engpass in einer größeren Kette von Lernprozessen darstellen kann. Typische Verhalten sind zum Beispiel: • Überanpassung eines parametrischen Modells an die Daten. Dies führt oft zu schlechterer Generalisierungsleistung auf ungesehenen Daten. • Die manuelle Anpassung von algorithmischen Parametern, wie zum Beispiel Lernraten ist oft mühsam, ineffizient und kostspielig. • Stochastische Verluste und Gradienten treten auf, wenn Zufallsstichproben anstelle eines ganzen großen Datensatzes für deren Berechnung benutzt wird. Erstere stellen nur inkomplette, oder korrupte Information über das empirische Risiko dar und sind deshalb schwieriger zu handhaben, wenn ein Algorithmus Entscheidungen treffen soll. Diese Arbeit enthält vier konzeptionelle Teile. Im ersten Teil argumentieren wir, dass bedingte Verteilungen von lokalen Voll- und Mini-Batch Verlusten und deren Gradienten gut mit Gaußverteilungen approximiert werden können, da die Verluste selbst Summen aus unabhängig und identisch verteilten Zufallsvariablen sind. Wir stellen daraufhin dar, wie man die suffizienten Statistiken, also Varianzen und Mittelwerte, mit geringem zusätzlichen Rechenaufwand schätzen kann. Dies führt zu analytischen Likelihood-Funktionen für Verlust und Gradient an jedem Eingabepunkt, die daraufhin in aktive Entscheidungen des Optimierer zur Laufzeit einbezogen werden können. Der zweite Teil konzentriert sich auf die Schätzung der Generalisierungsleistung nicht indem der Verlust eines Validierungsdatensatzes überwacht wird, sondern indem beurteilt wird, ob stochastische Gradienten vollständig durch Rauschen aufgrund der Endlichkeit des Trainingsdatensatzes und nicht durch eine informative Gradientenrichtung des erwarteten Verlusts (des Risikos), erklärt werden können. Daraus wird ein Early-Stopping Kriterium abgeleitet, das keinen Validierungsdatensatz benötigt, sodass der komplette Datensatz für das Training verwendet werden kann. Der dritte Teil betrifft die vollständige Automatisierung der Adaptierung von Lernraten für den stochastischen Gradientenabstieg (SGD). Globale Lernraten sind wohl die prominentesten Parameter von stochastischen Optimierungsroutinen, die manuell angepasst werden müssenWir stellen eine günstige und eigenständige Subroutine vor, genannt ’Probabilistic Line Search’, die automatisch die Lernrate in jedem Schritt, basierend auf einer lokalen Abstiegswahrscheinlichkeit, anpasst. Das Ergebnis ist ein vollständig parameterfreier stochastischer Optimierer, der vergleichbare oder bessere Generalisierungsleistung wie SGD mit sorgfältig von Hand eingestellten Lernraten erbringt. Der letzte Teil beschäftigt sich mit Suchrichtungen, die robust gegenüber Rauschen sind. Inspiriert von klassischen Optimierern erster und zweiter Ordnung, modellieren wir die Dynamik der Gradienten oder Hesse-Funktion auf dem Optimierungspfad. Dieser Ansatz ist stark verwandt mit klassischen Filter-Modellen, die aufeinanderfolgende verrauschte Gradienten berücksichtigen können Die Vorteile sind zweifältig. Zunächst gewinnen wir wertvolle Einsichten in weniger zugängliche oder ad hoc gewählte Designs klassischer Optimierer als Spezialfälle. Zweitens bereiten wir die Basis für flexible, eigenständige und nutzerfreundliche stochastische Optimierer mit einem erhöhten Grad an Robustheit und Automatisierung

    Inner-outer Iterative Methods for Eigenvalue Problems - Convergence and Preconditioning

    Get PDF
    Many methods for computing eigenvalues of a large sparse matrix involve shift-invert transformations which require the solution of a shifted linear system at each step. This thesis deals with shift-invert iterative techniques for solving eigenvalue problems where the arising linear systems are solved inexactly using a second iterative technique. This approach leads to an inner-outer type algorithm. We provide convergence results for the outer iterative eigenvalue computation as well as techniques for efficient inner solves. In particular eigenvalue computations using inexact inverse iteration, the Jacobi-Davidson method without subspace expansion and the shift-invert Arnoldi method as a subspace method are investigated in detail. A general convergence result for inexact inverse iteration for the non-Hermitian generalised eigenvalue problem is given, using only minimal assumptions. This convergence result is obtained in two different ways; on the one hand, we use an equivalence result between inexact inverse iteration applied to the generalised eigenproblem and modified Newton's method; on the other hand, a splitting method is used which generalises the idea of orthogonal decomposition. Both approaches also include an analysis for the convergence theory of a version of inexact Jacobi-Davidson method, where equivalences between Newton's method, inverse iteration and the Jacobi-Davidson method are exploited. To improve the efficiency of the inner iterative solves we introduce a new tuning strategy which can be applied to any standard preconditioner. We give a detailed analysis on this new preconditioning idea and show how the number of iterations for the inner iterative method and hence the total number of iterations can be reduced significantly by the application of this tuning strategy. The analysis of the tuned preconditioner is carried out for both Hermitian and non-Hermitian eigenproblems. We show how the preconditioner can be implemented efficiently and illustrate its performance using various numerical examples. An equivalence result between the preconditioned simplified Jacobi-Davidson method and inexact inverse iteration with the tuned preconditioner is given. Finally, we discuss the shift-invert Arnoldi method both in the standard and restarted fashion. First, existing relaxation strategies for the outer iterative solves are extended to implicitly restarted Arnoldi's method. Second, we apply the idea of tuning the preconditioner to the inner iterative solve. As for inexact inverse iteration the tuned preconditioner for inexact Arnoldi's method is shown to provide significant savings in the number of inner solves. The theory in this thesis is supported by many numerical examples.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore