11,232 research outputs found
A unifying view of optimism in episodic reinforcement learning
The principle of “optimism in the face of uncertainty” underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs anoptimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address large-scale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods
Selfconsistent Approximations in Mori's Theory
The constitutive quantities in Mori's theory, the residual forces, are
expanded in terms of time dependent correlation functions and products of
operators at , where it is assumed that the time derivatives of the
observables are given by products of them. As a first consequence the
Heisenberg dynamics of the observables are obtained as an expansion of the same
type. The dynamic equations for correlation functions result to be
selfconsistent nonlinear equations of the type known from mode-mode coupling
approximations. The approach yields a neccessary condition for the validity of
the presented equations. As a third consequence the static correlations can be
calculated from fluctuation-dissipation theorems, if the observables obey a Lie
algebra. For a simple spin model the convergence of the expansion is studied.
As a further test, dynamic and static correlations are calculated for a
Heisenberg ferromagnet at low temperatures, where the results are compared to
those of a Holstein Primakoff treatment.Comment: 51 pages, Latex, 3 eps figures included, elsart and epsf style files
included, also available at
http://athene.fkp.physik.th-darmstadt.de/public/wolfram.html and
ftp://athene.fkp.physik.th-darmstadt.de/pub/publications/wolfram
Tunneling dynamics of side chains and defects in proteins, polymer glasses, and OH-doped network glasses
Simulations on a Lennard-Jones computer glass are performed to study effects
arising from defects in glasses at low temperatures. The numerical analysis
reveals that already a low concentration of defects may dramatically change the
low temperature properties by giving rise to extrinsic double-well potentials
(DWP's). The main characteristics of these extrinsic DWP's are (i) high barrier
heights, (ii) high probability that a defect is indeed connected with an
extrinsic DWP, (iii) highly localized dynamics around this defect, and (iv)
smaller deformation potential coupling to phonons. Designing an extension of
the Standard Tunneling Model (STM) which parametrizes this picture and
comparing with ultrasound experiments on the wet network glass -BO
shows that effects of OH-impurities are accurately accounted for. This model is
then applied to organic polymer glasses and proteins. It is suggested that side
groups may act similarly like doped impurities inasmuch as extrinsic DWP's are
induced, which possess a distribution of barriers peaked around a high barrier
height. This compares with the structurlessly distributed barrier heights of
the intrinsic DWP's, which are associated with the backbone dynamics. It is
shown that this picture is consistent with elastic measurements on polymers,
and can explain anomalous nonlogarithmic line broadening recently observed in
hole burning experiments in PMMA.Comment: 34 pages, Revtex, 9 eps-figures, accepted for publication in J. Chem.
Phy
Algorithmic stability and hypothesis complexity
© 2017 by the author(s). We introduce a notion of algorithmic stability of learning algorithms-that we term argument stability-that captures stability of the hypothesis output by the learning algorithm in the normed space of functions from which hypotheses are selected. The main result of the paper bounds the generalization error of any learning algorithm in terms of its argument stability. The bounds are based on martingale inequalities in the Banach space to which the hypotheses belong. We apply the general bounds to bound the performance of some learning algorithms based on empirical risk minimization and stochastic gradient descent
ASDEX Upgrades New Plasma Control Scheme
ASDEX Upgrade is a medium sized tokamak experiment investigating highly shaped plasma and advanced scenarios to be extrapolated for ITER. Eleven independent magnetic coils allow for proper shaping and plasma current control. For plasma heating and current drive eight NBI beam lines, two ICRH antenna pairs and four ECRH gyrotrons are available. Five channels for controlling gas valves and a pellet injector serve for fuelling. All actuators are driven by a digital discharge control system. One basic enhancement of the latest generation is a unified framework for all feedforward and feedback control tasks in a discharge. The framework consists of two layers. The core layer implements wind-up safe feedback controllers with a collection of overlayed output limitations. Each controller is dynamically switchable in references, controlled variables, control law and control parameters via a control mode. The coordination layer implements intelligent discharge protection or optimisation algorithms which synchronously can change control modes and dynamically can generate reference waveforms adapted to the discharge's state and goal. The core layer comprises the backbone of plasma control. Current, shape, heating and fuel control all use a library of highly configurable single- and multivalriable control laws. P, PI and PID controllers are standard components but state space and sliding mode policies can easily be supplemented, too. Likewise, a broad selection of output limiters is available in the library. It ranges from constant values to rate limiters, and multi-signal dependent polynomial characteristics. The controller is aware of any output limitation and can take anti-wind-up measures. Furthermore, a feedforward policy allows to tune the behaviour upon mode transitions, like smooth adaptation or freezing the last output. With the coordination layer, tasks like marfe protection, power exhaust protection and soft pulse termination are accomplished. These specialised algorithms are plugged into the framework using a common interface. The framework approach easily allows for further extensions and opens a door for future experimental investigations
Front motion for phase transitions in systems with memory
We consider the Allen-Cahn equations with memory (a partial
integro-differential convolution equation). The prototype kernels are
exponentially decreasing functions of time and they reduce the
integrodifferential equation to a hyperbolic one, the damped Klein-Gordon
equation. By means of a formal asymptotic analysis we show that to the leading
order and under suitable assumptions on the kernels, the integro-differential
equation behave like a hyperbolic partial differential equation obtained by
considering prototype kernels: the evolution of fronts is governed by the
extended, damped Born-Infeld equation. We also apply our method to a system of
partial integro-differential equations which generalize the classical phase
field equations with a non-conserved order parameter and describe the process
of phase transitions where memory effects are present
- …