Search CORE

5,678 research outputs found

Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation

Author: Abdulsamad Hany
Peters Jan
Publication venue
Publication date: 01/01/2020
Field of study

The control of nonlinear dynamical systems remains a major challenge for autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies, which have yielded impressive results in solving a variety of hard control tasks. However, this new sophistication and extremely over-parameterized models have come with the cost of an overall reduction in our ability to interpret the resulting policies. In this paper, we take inspiration from the control community and apply the principles of hybrid switching systems in order to break down complex dynamics into simpler components. We exploit the rich representational power of probabilistic graphical models and derive an expectation-maximization (EM) algorithm for learning a sequence model to capture the temporal structure of the data and automatically decompose nonlinear dynamics into stochastic switching linear dynamical systems. Moreover, we show how this framework of switching models enables extracting hierarchies of Markovian and auto-regressive locally linear controllers from nonlinear experts in an imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro

arXiv.org e-Print Archive

MPG.PuRe

A Benchmark Environment Motivated by Industrial Control Problems

Author: Hein Daniel
Depeweg Stefan
Tokic Michel
Udluft Steffen
Hentschel Alexander
Runkler Thomas A.
Sterzing Volkmar
Publication venue
Publication date: 01/01/2018
Field of study

In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems

arXiv.org e-Print Archive

Crossref

FigShare

A Benchmark Environment Motivated by Industrial Control Problems

Author: Depeweg Stefan
Hein Daniel
Hentschel Alexander
Runkler Thomas A.
Sterzing Volkmar
Tokic Michel
Udluft Steffen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/02/2018
Field of study

arXiv.org e-Print Archive

Crossref

Completely-Positive Non-Markovian Decoherence

Author: C. W. Gardiner
Doyeol Ahn
Helen McAneney
Inbo Kim
Jinhyoung Lee
M. S. Kim
N. Vats
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2004
Field of study

We propose an effective Hamiltonian approach to investigate decoherence of a quantum system in a non-Markovian reservoir, naturally imposing the complete positivity on the reduced dynamics of the system. The formalism is based on the notion of an effective reservoir, i.e., certain collective degrees of freedom in the reservoir that are responsible for the decoherence. As examples for completely positive decoherence, we present three typical decoherence processes for a qubit such as dephasing, depolarizing, and amplitude-damping. The effects of the non-Markovian decoherence are compared to the Markovian decoherence.Comment: 8 pages, 1 figur

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

CERN Document Server

Algorithmic quantum simulation of memory effects

Author: Alvarez-Rodriguez U.
Casanova J.
Di Candia R.
Sanz M.
Solano E.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2017
Field of study

We propose a method for the algorithmic quantum simulation of memory effects described by integrodifferential evolution equations. It consists in the systematic use of perturbation theory techniques and a Markovian quantum simulator. Our method aims to efficiently simulate both completely positive and nonpositive dynamics without the requirement of engineering non-Markovian environments. Finally, we find that small error bounds can be reached with polynomially scaling resources, evaluated as the time required for the simulation

arXiv.org e-Print Archive

Institutional Repository of the Freie Universität Berlin

LTLf/LDLf Non-Markovian Rewards

Author: Brafman RONEN ISRAEL
DE GIACOMO Giuseppe
Patrizi Fabio
Publication venue: AAAI Press
Publication date: 01/01/2018
Field of study

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees

Archivio della ricerca- Università di Roma La Sapienza