Search CORE

11,232 research outputs found

A unifying view of optimism in episodic reinforcement learning

Author: Neu G
Pike-Burke C
Publication venue
Publication date: 03/07/2020
Field of study

The principle of “optimism in the face of uncertainty” underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs anoptimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address large-scale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Selfconsistent Approximations in Mori's Theory

Author: Fick
G. Sauermann
Götze
H. Turschner
Kawasaki
Lovesey
Lovesey
Lovesey
Mori
Neu
Perron
W. Just
Publication venue: 'Elsevier BV'
Publication date: 28/03/1997
Field of study

The constitutive quantities in Mori's theory, the residual forces, are expanded in terms of time dependent correlation functions and products of operators at

t=0

, where it is assumed that the time derivatives of the observables are given by products of them. As a first consequence the Heisenberg dynamics of the observables are obtained as an expansion of the same type. The dynamic equations for correlation functions result to be selfconsistent nonlinear equations of the type known from mode-mode coupling approximations. The approach yields a neccessary condition for the validity of the presented equations. As a third consequence the static correlations can be calculated from fluctuation-dissipation theorems, if the observables obey a Lie algebra. For a simple spin model the convergence of the expansion is studied. As a further test, dynamic and static correlations are calculated for a Heisenberg ferromagnet at low temperatures, where the results are compared to those of a Holstein Primakoff treatment.Comment: 51 pages, Latex, 3 eps figures included, elsart and epsf style files included, also available at http://athene.fkp.physik.th-darmstadt.de/public/wolfram.html and ftp://athene.fkp.physik.th-darmstadt.de/pub/publications/wolfram

arXiv.org e-Print Archive

Crossref

Tunneling dynamics of side chains and defects in proteins, polymer glasses, and OH-doped network glasses

Author: Andreas Heuer
Bai T. S.
Burin A. L.
Federle G.
Peter Neu
Yu C. C.
Publication venue: 'AIP Publishing'
Publication date: 01/01/1997
Field of study

Simulations on a Lennard-Jones computer glass are performed to study effects arising from defects in glasses at low temperatures. The numerical analysis reveals that already a low concentration of defects may dramatically change the low temperature properties by giving rise to extrinsic double-well potentials (DWP's). The main characteristics of these extrinsic DWP's are (i) high barrier heights, (ii) high probability that a defect is indeed connected with an extrinsic DWP, (iii) highly localized dynamics around this defect, and (iv) smaller deformation potential coupling to phonons. Designing an extension of the Standard Tunneling Model (STM) which parametrizes this picture and comparing with ultrasound experiments on the wet network glass

a

-B

_2

_3

shows that effects of OH-impurities are accurately accounted for. This model is then applied to organic polymer glasses and proteins. It is suggested that side groups may act similarly like doped impurities inasmuch as extrinsic DWP's are induced, which possess a distribution of barriers peaked around a high barrier height. This compares with the structurlessly distributed barrier heights of the intrinsic DWP's, which are associated with the backbone dynamics. It is shown that this picture is consistent with elastic measurements on polymers, and can explain anomalous nonlogarithmic line broadening recently observed in hole burning experiments in PMMA.Comment: 34 pages, Revtex, 9 eps-figures, accepted for publication in J. Chem. Phy

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Algorithmic stability and hypothesis complexity

Author: Liu T
Lugosi G
Neu G
Tao D
Publication venue
Publication date: 01/01/2017
Field of study

© 2017 by the author(s). We introduce a notion of algorithmic stability of learning algorithms-that we term argument stability-that captures stability of the hypothesis output by the learning algorithm in the normed space of functions from which hypotheses are selected. The main result of the paper bounds the generalization error of any learning algorithm in terms of its argument stability. The bounds are based on martingale inequalities in the Banach space to which the hypotheses belong. We apply the general bounds to bound the performance of some learning algorithms based on empirical risk minimization and stochastic gradient descent

OPUS - University of Technology Sydney

ASDEX Upgrades New Plasma Control Scheme

Author: Neu G.
Raupp G.
Treutterer W.
Zasche D.
Zehetbauer T.
Publication venue: 'Elsevier BV'
Publication date: 12/07/2005
Field of study

ASDEX Upgrade is a medium sized tokamak experiment investigating highly shaped plasma and advanced scenarios to be extrapolated for ITER. Eleven independent magnetic coils allow for proper shaping and plasma current control. For plasma heating and current drive eight NBI beam lines, two ICRH antenna pairs and four ECRH gyrotrons are available. Five channels for controlling gas valves and a pellet injector serve for fuelling. All actuators are driven by a digital discharge control system. One basic enhancement of the latest generation is a unified framework for all feedforward and feedback control tasks in a discharge. The framework consists of two layers. The core layer implements wind-up safe feedback controllers with a collection of overlayed output limitations. Each controller is dynamically switchable in references, controlled variables, control law and control parameters via a control mode. The coordination layer implements intelligent discharge protection or optimisation algorithms which synchronously can change control modes and dynamically can generate reference waveforms adapted to the discharge's state and goal. The core layer comprises the backbone of plasma control. Current, shape, heating and fuel control all use a library of highly configurable single- and multivalriable control laws. P, PI and PID controllers are standard components but state space and sliding mode policies can easily be supplemented, too. Likewise, a broad selection of output limiters is available in the library. It ranges from constant values to rate limiters, and multi-signal dependent polynomial characteristics. The controller is aware of any output limitation and can take anti-wind-up measures. Furthermore, a feedforward policy allows to tune the behaviour upon mode transitions, like smooth adaptation or freezing the last output. With the coordination layer, tasks like marfe protection, power exhaust protection and soft pulse termination are accomplished. These specialised algorithms are plugged into the framework using a common interface. The framework approach easily allows for further extensions and opens a door for future experimental investigations

MPG.PuRe

Front motion for phase transitions in systems with memory

Author: Aizicovici
Alexander I. Domoshnitsky
Alexander Nepomnyashchy
Binder
Caginalp
Caginalp
Caginalp
Caginalp
Cahn
Fife
Horacio G. Rotstein
Jäckle
Jäckle
Jäckle
Jäckle
Neu
Rotstein
Rubinstein
Publication venue: 'Elsevier BV'
Publication date: 22/02/2000
Field of study

We consider the Allen-Cahn equations with memory (a partial integro-differential convolution equation). The prototype kernels are exponentially decreasing functions of time and they reduce the integrodifferential equation to a hyperbolic one, the damped Klein-Gordon equation. By means of a formal asymptotic analysis we show that to the leading order and under suitable assumptions on the kernels, the integro-differential equation behave like a hyperbolic partial differential equation obtained by considering prototype kernels: the evolution of fronts is governed by the extended, damped Born-Infeld equation. We also apply our method to a system of partial integro-differential equations which generalize the classical phase field equations with a non-conserved order parameter and describe the process of phase transitions where memory effects are present

arXiv.org e-Print Archive

Crossref

Configuration Environment for the ASDEX Upgrade Control System

Author: Mertens V
Neu G
Raupp G
Treutterer W
Zasche D
Zehetbauer T
Publication venue
Publication date: 01/01/1999
Field of study

CERN Document Server

Event detection and exception handling strategies in the ASDEX Upgrade discharge control system

Author: Neu G.
Rapson C.
Raupp G.
Treutterer W.
Zasche D.
Zehetbauer T.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

MPG.PuRe