11,005 research outputs found

    A unifying view of optimism in episodic reinforcement learning

    Get PDF
    The principle of “optimism in the face of uncertainty” underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs anoptimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address large-scale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods

    Selfconsistent Approximations in Mori's Theory

    Full text link
    The constitutive quantities in Mori's theory, the residual forces, are expanded in terms of time dependent correlation functions and products of operators at t=0t=0, where it is assumed that the time derivatives of the observables are given by products of them. As a first consequence the Heisenberg dynamics of the observables are obtained as an expansion of the same type. The dynamic equations for correlation functions result to be selfconsistent nonlinear equations of the type known from mode-mode coupling approximations. The approach yields a neccessary condition for the validity of the presented equations. As a third consequence the static correlations can be calculated from fluctuation-dissipation theorems, if the observables obey a Lie algebra. For a simple spin model the convergence of the expansion is studied. As a further test, dynamic and static correlations are calculated for a Heisenberg ferromagnet at low temperatures, where the results are compared to those of a Holstein Primakoff treatment.Comment: 51 pages, Latex, 3 eps figures included, elsart and epsf style files included, also available at http://athene.fkp.physik.th-darmstadt.de/public/wolfram.html and ftp://athene.fkp.physik.th-darmstadt.de/pub/publications/wolfram

    Tunneling dynamics of side chains and defects in proteins, polymer glasses, and OH-doped network glasses

    Full text link
    Simulations on a Lennard-Jones computer glass are performed to study effects arising from defects in glasses at low temperatures. The numerical analysis reveals that already a low concentration of defects may dramatically change the low temperature properties by giving rise to extrinsic double-well potentials (DWP's). The main characteristics of these extrinsic DWP's are (i) high barrier heights, (ii) high probability that a defect is indeed connected with an extrinsic DWP, (iii) highly localized dynamics around this defect, and (iv) smaller deformation potential coupling to phonons. Designing an extension of the Standard Tunneling Model (STM) which parametrizes this picture and comparing with ultrasound experiments on the wet network glass aa-B2_2O3_3 shows that effects of OH-impurities are accurately accounted for. This model is then applied to organic polymer glasses and proteins. It is suggested that side groups may act similarly like doped impurities inasmuch as extrinsic DWP's are induced, which possess a distribution of barriers peaked around a high barrier height. This compares with the structurlessly distributed barrier heights of the intrinsic DWP's, which are associated with the backbone dynamics. It is shown that this picture is consistent with elastic measurements on polymers, and can explain anomalous nonlogarithmic line broadening recently observed in hole burning experiments in PMMA.Comment: 34 pages, Revtex, 9 eps-figures, accepted for publication in J. Chem. Phy

    Algorithmic stability and hypothesis complexity

    Full text link
    © 2017 by the author(s). We introduce a notion of algorithmic stability of learning algorithms-that we term argument stability-that captures stability of the hypothesis output by the learning algorithm in the normed space of functions from which hypotheses are selected. The main result of the paper bounds the generalization error of any learning algorithm in terms of its argument stability. The bounds are based on martingale inequalities in the Banach space to which the hypotheses belong. We apply the general bounds to bound the performance of some learning algorithms based on empirical risk minimization and stochastic gradient descent

    ASDEX Upgrades New Plasma Control Scheme

    Get PDF
    ASDEX Upgrade is a medium sized tokamak experiment investigating highly shaped plasma and advanced scenarios to be extrapolated for ITER. Eleven independent magnetic coils allow for proper shaping and plasma current control. For plasma heating and current drive eight NBI beam lines, two ICRH antenna pairs and four ECRH gyrotrons are available. Five channels for controlling gas valves and a pellet injector serve for fuelling. All actuators are driven by a digital discharge control system. One basic enhancement of the latest generation is a unified framework for all feedforward and feedback control tasks in a discharge. The framework consists of two layers. The core layer implements wind-up safe feedback controllers with a collection of overlayed output limitations. Each controller is dynamically switchable in references, controlled variables, control law and control parameters via a control mode. The coordination layer implements intelligent discharge protection or optimisation algorithms which synchronously can change control modes and dynamically can generate reference waveforms adapted to the discharge's state and goal. The core layer comprises the backbone of plasma control. Current, shape, heating and fuel control all use a library of highly configurable single- and multivalriable control laws. P, PI and PID controllers are standard components but state space and sliding mode policies can easily be supplemented, too. Likewise, a broad selection of output limiters is available in the library. It ranges from constant values to rate limiters, and multi-signal dependent polynomial characteristics. The controller is aware of any output limitation and can take anti-wind-up measures. Furthermore, a feedforward policy allows to tune the behaviour upon mode transitions, like smooth adaptation or freezing the last output. With the coordination layer, tasks like marfe protection, power exhaust protection and soft pulse termination are accomplished. These specialised algorithms are plugged into the framework using a common interface. The framework approach easily allows for further extensions and opens a door for future experimental investigations

    Front motion for phase transitions in systems with memory

    Full text link
    We consider the Allen-Cahn equations with memory (a partial integro-differential convolution equation). The prototype kernels are exponentially decreasing functions of time and they reduce the integrodifferential equation to a hyperbolic one, the damped Klein-Gordon equation. By means of a formal asymptotic analysis we show that to the leading order and under suitable assumptions on the kernels, the integro-differential equation behave like a hyperbolic partial differential equation obtained by considering prototype kernels: the evolution of fronts is governed by the extended, damped Born-Infeld equation. We also apply our method to a system of partial integro-differential equations which generalize the classical phase field equations with a non-conserved order parameter and describe the process of phase transitions where memory effects are present

    Reply to Comment on "Cosmic rays, carbon dioxide, and climate"

    Get PDF
    In our analysis [Rahmstorf et al., 2004], we arrived at two main conclusions: the data of Shaviv and Veizer [2003] do not show a significant correlation of cosmic ray flux (CRF) and climate, and the authors' estimate of climate sensitivity to CO2 based on a simple regression analysis is questionable. After careful consideration of Shaviv and Veizer's comment, we want to uphold and reaffirm these conclusions. Concerning the question of correlation, we pointed out that a correlation arose only after several adjustments to the data, including shifting one of the four CRF peaks and stretching the time scale. To calculate statistical significance, we first need to compute the number of independent data points in the CRF and temperature curves being correlated, accounting for their autocorrelation. A standard estimate [Quenouille, 1952] of the number of effective data points is urn:x-wiley:00963941:media:eost14930:eost14930-math-0001 where N is the total number of data points and r1, r2 are the autocorrelations of the two series. For the curves of Shaviv and Veizer [2003], the result is NEFF = 4.8. This is consistent with the fact that these are smooth curves with four humps, and with the fact that for CRF the position of the four peaks is determined by four spiral arm crossings or four meteorite clusters, respectively; that is, by four independent data points. The number of points that enter the calculation of statistical significance of a linear correlation is (NEFF− 2), since any curves based on only two points show perfect correlation; at least three independent points are needed for a meaningful result
    corecore