2,796 research outputs found
Selectively decentralized reinforcement learning
Indiana University-Purdue University Indianapolis (IUPUI)The main contributions in this thesis include the selectively decentralized method in solving multi-agent reinforcement learning problems and the discretized Markov-decision-process (MDP) algorithm to compute the sub-optimal learning policy in completely unknown learning and control problems. These contributions tackle several challenges in multi-agent reinforcement learning: the unknown and dynamic nature of the learning environment, the difficulty in computing the closed-form solution of the learning problem, the slow learning performance in large-scale systems, and the questions of how/when/to whom the learning agents should communicate among themselves. Through this thesis, the selectively decentralized method, which evaluates all of the possible communicative strategies, not only increases the learning speed, achieves better learning goals but also could learn the communicative policy for each learning agent. Compared to the other state-of-the-art approaches, this thesis’s contributions offer two advantages. First, the selectively decentralized method could incorporate a wide range of well-known algorithms, including the discretized MDP, in single-agent reinforcement learning; meanwhile, the state-of-the-art approaches usually could be applied for one class of algorithms. Second, the discretized MDP algorithm could compute the sub-optimal learning policy when the environment is described in general nonlinear format; meanwhile, the other state-of-the-art approaches often assume that the environment is in limited format, particularly in feedback-linearization form. This thesis also discusses several alternative approaches for multi-agent learning, including Multidisciplinary Optimization. In addition, this thesis shows how the selectively decentralized method could successfully solve several real-worlds problems, particularly in mechanical and biological systems
Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees
Continuous-time nonlinear optimal control problems hold great promise in
real-world applications. After decades of development, reinforcement learning
(RL) has achieved some of the greatest successes as a general nonlinear control
design method. However, a recent comprehensive analysis of state-of-the-art
continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming
(ADP)-based CT-RL algorithms, reveals they face significant design challenges
due to their complexity, numerical conditioning, and dimensional scaling
issues. Despite advanced theoretical results, existing ADP CT-RL synthesis
methods are inadequate in solving even small, academic problems. The goal of
this work is thus to introduce a suite of new CT-RL algorithms for control of
affine nonlinear systems. Our design approach relies on two important factors.
First, our methods are applicable to physical systems that can be partitioned
into smaller subproblems. This constructive consideration results in reduced
dimensionality and greatly improved intuitiveness of design. Second, we
introduce a new excitation framework to improve persistence of excitation (PE)
and numerical conditioning performance via classical input/output insights.
Such a design-centric approach is the first of its kind in the ADP CT-RL
community. In this paper, we progressively introduce a suite of (decentralized)
excitable integral reinforcement learning (EIRL) algorithms. We provide
convergence and closed-loop stability guarantees, and we demonstrate these
guarantees on a significant application problem of controlling an unstable,
nonminimum phase hypersonic vehicle (HSV)
Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability
Internet-of-Things (IoT) envisions an intelligent infrastructure of networked
smart devices offering task-specific monitoring and control services. The
unique features of IoT include extreme heterogeneity, massive number of
devices, and unpredictable dynamics partially due to human interaction. These
call for foundational innovations in network design and management. Ideally, it
should allow efficient adaptation to changing environments, and low-cost
implementation scalable to massive number of devices, subject to stringent
latency constraints. To this end, the overarching goal of this paper is to
outline a unified framework for online learning and management policies in IoT
through joint advances in communication, networking, learning, and
optimization. From the network architecture vantage point, the unified
framework leverages a promising fog architecture that enables smart devices to
have proximity access to cloud functionalities at the network edge, along the
cloud-to-things continuum. From the algorithmic perspective, key innovations
target online approaches adaptive to different degrees of nonstationarity in
IoT dynamics, and their scalable model-free implementation under limited
feedback that motivates blind or bandit approaches. The proposed framework
aspires to offer a stepping stone that leads to systematic designs and analysis
of task-specific learning and management schemes for IoT, along with a host of
new research directions to build on.Comment: Submitted on June 15 to Proceeding of IEEE Special Issue on Adaptive
and Scalable Communication Network
Reinforcement Learning in Different Phases of Quantum Control
The ability to prepare a physical system in a desired quantum state is
central to many areas of physics such as nuclear magnetic resonance, cold
atoms, and quantum computing. Yet, preparing states quickly and with high
fidelity remains a formidable challenge. In this work we implement cutting-edge
Reinforcement Learning (RL) techniques and show that their performance is
comparable to optimal control methods in the task of finding short,
high-fidelity driving protocol from an initial to a target state in
non-integrable many-body quantum systems of interacting qubits. RL methods
learn about the underlying physical system solely through a single scalar
reward (the fidelity of the resulting state) calculated from numerical
simulations of the physical system. We further show that quantum state
manipulation, viewed as an optimization problem, exhibits a spin-glass-like
phase transition in the space of protocols as a function of the protocol
duration. Our RL-aided approach helps identify variational protocols with
nearly optimal fidelity, even in the glassy phase, where optimal state
manipulation is exponentially hard. This study highlights the potential
usefulness of RL for applications in out-of-equilibrium quantum physics.Comment: A legend for the videos referred to in the paper is available on
https://mgbukov.github.io/RL_movies
Computational intelligence approaches to robotics, automation, and control [Volume guest editors]
No abstract available
Reduction of Markov Chains using a Value-of-Information-Based Approach
In this paper, we propose an approach to obtain reduced-order models of
Markov chains. Our approach is composed of two information-theoretic processes.
The first is a means of comparing pairs of stationary chains on different state
spaces, which is done via the negative Kullback-Leibler divergence defined on a
model joint space. Model reduction is achieved by solving a
value-of-information criterion with respect to this divergence. Optimizing the
criterion leads to a probabilistic partitioning of the states in the high-order
Markov chain. A single free parameter that emerges through the optimization
process dictates both the partition uncertainty and the number of state groups.
We provide a data-driven means of choosing the `optimal' value of this free
parameter, which sidesteps needing to a priori know the number of state groups
in an arbitrary chain.Comment: Submitted to Entrop
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
Chaotic exploration and learning of locomotor behaviours
Recent developments in the embodied approach to understanding the generation of
adaptive behaviour, suggests that the design of adaptive neural circuits for rhythmic
motor patterns should not be done in isolation from an appreciation, and indeed
exploitation, of neural-body-environment interactions. Utilising spontaneous mutual
entrainment between neural systems and physical bodies provides a useful passage
to the regions of phase space which are naturally structured by the neuralbody-
environmental interactions. A growing body of work has provided evidence
that chaotic dynamics can be useful in allowing embodied systems to spontaneously
explore potentially useful motor patterns. However, up until now there has
been no general integrated neural system that allows goal-directed, online, realtime
exploration and capture of motor patterns without recourse to external monitoring,
evaluation or training methods. For the first time, we introduce such a system
in the form of a fully dynamic neural system, exploiting intrinsic chaotic dynamics,
for the exploration and learning of the possible locomotion patterns of an articulated
robot of an arbitrary morphology in an unknown environment. The controller
is modelled as a network of neural oscillators which are coupled only through physical
embodiment, and goal directed exploration of coordinated motor patterns is
achieved by a chaotic search using adaptive bifurcation. The phase space of the
indirectly coupled neural-body-environment system contains multiple transient or
permanent self-organised dynamics each of which is a candidate for a locomotion
behaviour. The adaptive bifurcation enables the system orbit to wander through
various phase-coordinated states using its intrinsic chaotic dynamics as a driving
force and stabilises the system on to one of the states matching the given goal
criteria. In order to improve the sustainability of useful transient patterns, sensory
homeostasis has been introduced which results in an increased diversity of motor outputs,
thus achieving multi-scale exploration. A rhythmic pattern discovered by this
process is memorised and sustained by changing the wiring between initially disconnected
oscillators using an adaptive synchronisation method. The dynamical nature
of the weak coupling through physical embodiment allows this adaptive weight learning
to be easily integrated, thus forming a continuous exploration-learning system.
Our result shows that the novel neuro-robotic system is able to create and learn a
number of emergent locomotion behaviours for a wide range of body configurations
and physical environment, and can re-adapt after sustaining damage. The implications
and analyses of these results for investigating the generality and limitations of
the proposed system are discussed
- …