Search CORE

2,796 research outputs found

Selectively decentralized reinforcement learning

Author: Nguyen Thanh Minh
Publication venue
Publication date: 01/01/2018
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)The main contributions in this thesis include the selectively decentralized method in solving multi-agent reinforcement learning problems and the discretized Markov-decision-process (MDP) algorithm to compute the sub-optimal learning policy in completely unknown learning and control problems. These contributions tackle several challenges in multi-agent reinforcement learning: the unknown and dynamic nature of the learning environment, the difficulty in computing the closed-form solution of the learning problem, the slow learning performance in large-scale systems, and the questions of how/when/to whom the learning agents should communicate among themselves. Through this thesis, the selectively decentralized method, which evaluates all of the possible communicative strategies, not only increases the learning speed, achieves better learning goals but also could learn the communicative policy for each learning agent. Compared to the other state-of-the-art approaches, this thesis’s contributions offer two advantages. First, the selectively decentralized method could incorporate a wide range of well-known algorithms, including the discretized MDP, in single-agent reinforcement learning; meanwhile, the state-of-the-art approaches usually could be applied for one class of algorithms. Second, the discretized MDP algorithm could compute the sub-optimal learning policy when the environment is described in general nonlinear format; meanwhile, the other state-of-the-art approaches often assume that the environment is in limited format, particularly in feedback-linearization form. This thesis also discusses several alternative approaches for multi-agent learning, including Multidisciplinary Optimization. In addition, this thesis shows how the selectively decentralized method could successfully solve several real-worlds problems, particularly in mechanical and biological systems

IUPUIScholarWorks

Purdue E-Pubs

Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

Author: Si Jennie
Wallace Brent A.
Publication venue
Publication date: 17/07/2023
Field of study

Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL algorithms, reveals they face significant design challenges due to their complexity, numerical conditioning, and dimensional scaling issues. Despite advanced theoretical results, existing ADP CT-RL synthesis methods are inadequate in solving even small, academic problems. The goal of this work is thus to introduce a suite of new CT-RL algorithms for control of affine nonlinear systems. Our design approach relies on two important factors. First, our methods are applicable to physical systems that can be partitioned into smaller subproblems. This constructive consideration results in reduced dimensionality and greatly improved intuitiveness of design. Second, we introduce a new excitation framework to improve persistence of excitation (PE) and numerical conditioning performance via classical input/output insights. Such a design-centric approach is the first of its kind in the ADP CT-RL community. In this paper, we progressively introduce a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms. We provide convergence and closed-loop stability guarantees, and we demonstrate these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV)

arXiv.org e-Print Archive

Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability

Author: Barbarossa Sergio
Chen Tianyi
Giannakis Georgios B.
Wang Xin
Zhang Zhi-Li
Publication venue
Publication date: 27/10/2018
Field of study

Internet-of-Things (IoT) envisions an intelligent infrastructure of networked smart devices offering task-specific monitoring and control services. The unique features of IoT include extreme heterogeneity, massive number of devices, and unpredictable dynamics partially due to human interaction. These call for foundational innovations in network design and management. Ideally, it should allow efficient adaptation to changing environments, and low-cost implementation scalable to massive number of devices, subject to stringent latency constraints. To this end, the overarching goal of this paper is to outline a unified framework for online learning and management policies in IoT through joint advances in communication, networking, learning, and optimization. From the network architecture vantage point, the unified framework leverages a promising fog architecture that enables smart devices to have proximity access to cloud functionalities at the network edge, along the cloud-to-things continuum. From the algorithmic perspective, key innovations target online approaches adaptive to different degrees of nonstationarity in IoT dynamics, and their scalable model-free implementation under limited feedback that motivates blind or bandit approaches. The proposed framework aspires to offer a stepping stone that leads to systematic designs and analysis of task-specific learning and management schemes for IoT, along with a host of new research directions to build on.Comment: Submitted on June 15 to Proceeding of IEEE Special Issue on Adaptive and Scalable Communication Network

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Reinforcement Learning in Different Phases of Quantum Control

Author: Bukov Marin
Day Alexandre G. R.
Mehta Pankaj
Polkovnikov Anatoli
Sels Dries
Weinberg Phillip
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2018
Field of study

The ability to prepare a physical system in a desired quantum state is central to many areas of physics such as nuclear magnetic resonance, cold atoms, and quantum computing. Yet, preparing states quickly and with high fidelity remains a formidable challenge. In this work we implement cutting-edge Reinforcement Learning (RL) techniques and show that their performance is comparable to optimal control methods in the task of finding short, high-fidelity driving protocol from an initial to a target state in non-integrable many-body quantum systems of interacting qubits. RL methods learn about the underlying physical system solely through a single scalar reward (the fidelity of the resulting state) calculated from numerical simulations of the physical system. We further show that quantum state manipulation, viewed as an optimization problem, exhibits a spin-glass-like phase transition in the space of protocols as a function of the protocol duration. Our RL-aided approach helps identify variational protocols with nearly optimal fidelity, even in the glassy phase, where optimal state manipulation is exponentially hard. This study highlights the potential usefulness of RL for applications in out-of-equilibrium quantum physics.Comment: A legend for the videos referred to in the paper is available on https://mgbukov.github.io/RL_movies

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

Institutional Repository Universiteit Antwerpen

Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

Author: Chen Yi
Gu Dongbing
Hu Huosheng
Li Yun
Xu Peter
Zhang Jun
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

No abstract available

Enlighten

Reduction of Markov Chains using a Value-of-Information-Based Approach

Author: Principe Jose C.
Sledge Isaac J.
Publication venue: 'MDPI AG'
Publication date: 01/03/2019
Field of study

In this paper, we propose an approach to obtain reduced-order models of Markov chains. Our approach is composed of two information-theoretic processes. The first is a means of comparing pairs of stationary chains on different state spaces, which is done via the negative Kullback-Leibler divergence defined on a model joint space. Model reduction is achieved by solving a value-of-information criterion with respect to this divergence. Optimizing the criterion leads to a probabilistic partitioning of the states in the high-order Markov chain. A single free parameter that emerges through the optimization process dictates both the partition uncertainty and the number of state groups. We provide a data-driven means of choosing the `optimal' value of this free parameter, which sidesteps needing to a priori know the number of state groups in an arbitrary chain.Comment: Submitted to Entrop

arXiv.org e-Print Archive

Directory of Open Access Journals

International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

Author: Arndt Rafael
Hintermüller Michael
Huber Olivier
Löbhard Caroline
Stengl Steven-Marian
Publication venue
Publication date: 01/01/2019
Field of study

The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Chaotic exploration and learning of locomotor behaviours

Author: Shim Yoonsik
Publication venue
Publication date: 05/02/2013
Field of study

Recent developments in the embodied approach to understanding the generation of adaptive behaviour, suggests that the design of adaptive neural circuits for rhythmic motor patterns should not be done in isolation from an appreciation, and indeed exploitation, of neural-body-environment interactions. Utilising spontaneous mutual entrainment between neural systems and physical bodies provides a useful passage to the regions of phase space which are naturally structured by the neuralbody- environmental interactions. A growing body of work has provided evidence that chaotic dynamics can be useful in allowing embodied systems to spontaneously explore potentially useful motor patterns. However, up until now there has been no general integrated neural system that allows goal-directed, online, realtime exploration and capture of motor patterns without recourse to external monitoring, evaluation or training methods. For the first time, we introduce such a system in the form of a fully dynamic neural system, exploiting intrinsic chaotic dynamics, for the exploration and learning of the possible locomotion patterns of an articulated robot of an arbitrary morphology in an unknown environment. The controller is modelled as a network of neural oscillators which are coupled only through physical embodiment, and goal directed exploration of coordinated motor patterns is achieved by a chaotic search using adaptive bifurcation. The phase space of the indirectly coupled neural-body-environment system contains multiple transient or permanent self-organised dynamics each of which is a candidate for a locomotion behaviour. The adaptive bifurcation enables the system orbit to wander through various phase-coordinated states using its intrinsic chaotic dynamics as a driving force and stabilises the system on to one of the states matching the given goal criteria. In order to improve the sustainability of useful transient patterns, sensory homeostasis has been introduced which results in an increased diversity of motor outputs, thus achieving multi-scale exploration. A rhythmic pattern discovered by this process is memorised and sustained by changing the wiring between initially disconnected oscillators using an adaptive synchronisation method. The dynamical nature of the weak coupling through physical embodiment allows this adaptive weight learning to be easily integrated, thus forming a continuous exploration-learning system. Our result shows that the novel neuro-robotic system is able to create and learn a number of emergent locomotion behaviours for a wide range of body configurations and physical environment, and can re-adapt after sustaining damage. The implications and analyses of these results for investigating the generality and limitations of the proposed system are discussed

Sussex Research Online