13 research outputs found
Gym-ANM: Open-source software to leverage reinforcement learning for power system management in research and education
Gym-ANM is a Python package that facilitates the design of reinforcement
learning (RL) environments that model active network management (ANM) tasks in
electricity networks. Here, we describe how to implement new environments and
how to write code to interact with pre-existing ones. We also provide an
overview of ANM6-Easy, an environment designed to highlight common ANM
challenges. Finally, we discuss the potential impact of Gym-ANM on the
scientific community, both in terms of research and education. We hope this
package will facilitate collaboration between the power system and RL
communities in the search for algorithms to control future energy systems.Comment: 5 pages, 2 figures, 2 code sample
Deep learning approach to control of prosthetic hands with electromyography signals
Natural muscles provide mobility in response to nerve impulses.
Electromyography (EMG) measures the electrical activity of muscles in response
to a nerve's stimulation. In the past few decades, EMG signals have been used
extensively in the identification of user intention to potentially control
assistive devices such as smart wheelchairs, exoskeletons, and prosthetic
devices. In the design of conventional assistive devices, developers optimize
multiple subsystems independently. Feature extraction and feature description
are essential subsystems of this approach. Therefore, researchers proposed
various hand-crafted features to interpret EMG signals. However, the
performance of conventional assistive devices is still unsatisfactory. In this
paper, we propose a deep learning approach to control prosthetic hands with raw
EMG signals. We use a novel deep convolutional neural network to eschew the
feature-engineering step. Removing the feature extraction and feature
description is an important step toward the paradigm of end-to-end
optimization. Fine-tuning and personalization are additional advantages of our
approach. The proposed approach is implemented in Python with TensorFlow deep
learning library, and it runs in real-time in general-purpose graphics
processing units of NVIDIA Jetson TX2 developer kit. Our results demonstrate
the ability of our system to predict fingers position from raw EMG signals. We
anticipate our EMG-based control system to be a starting point to design more
sophisticated prosthetic hands. For example, a pressure measurement unit can be
added to transfer the perception of the environment to the user. Furthermore,
our system can be modified for other prosthetic devices.Comment: Conference. Houston, Texas, USA. September, 201
End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic Hands
Speech is one of the most common forms of communication in humans. Speech
commands are essential parts of multimodal controlling of prosthetic hands. In
the past decades, researchers used automatic speech recognition systems for
controlling prosthetic hands by using speech commands. Automatic speech
recognition systems learn how to map human speech to text. Then, they used
natural language processing or a look-up table to map the estimated text to a
trajectory. However, the performance of conventional speech-controlled
prosthetic hands is still unsatisfactory. Recent advancements in
general-purpose graphics processing units (GPGPUs) enable intelligent devices
to run deep neural networks in real-time. Thus, architectures of intelligent
systems have rapidly transformed from the paradigm of composite subsystems
optimization to the paradigm of end-to-end optimization. In this paper, we
propose an end-to-end convolutional neural network (CNN) that maps speech 2D
features directly to trajectories for prosthetic hands. The proposed
convolutional neural network is lightweight, and thus it runs in real-time in
an embedded GPGPU. The proposed method can use any type of speech 2D feature
that has local correlations in each dimension such as spectrogram, MFCC, or
PNCC. We omit the speech to text step in controlling the prosthetic hand in
this paper. The network is written in Python with Keras library that has a
TensorFlow backend. We optimized the CNN for NVIDIA Jetson TX2 developer kit.
Our experiment on this CNN demonstrates a root-mean-square error of 0.119 and
20ms running time to produce trajectory outputs corresponding to the voice
input data. To achieve a lower error in real-time, we can optimize a similar
CNN for a more powerful embedded GPGPU such as NVIDIA AGX Xavier
Combining Model-Based and Model-Free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach
Model-free learning-based control methods have seen great success recently. However, such methods typically suffer from poor sample complexity and limited convergence guarantees. This is in sharp contrast to classical model-based control, which has a rich theory but typically requires strong modeling assumptions. In this paper, we combine the two approaches to achieve the best of both worlds. We consider a dynamical system with both linear and non-linear components and develop a novel approach to use the linear model to define a warm start for a model-free, policy gradient method. We show this hybrid approach outperforms the model-based controller while avoiding the convergence issues associated with model-free approaches via both numerical experiments and theoretical analyses, in which we derive sufficient conditions on the non-linear component such that our approach is guaranteed to converge to the (nearly) global optimal controller
Combining Model-Based and Model-Free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach
Model-free learning-based control methods have seen great success recently.
However, such methods typically suffer from poor sample complexity and limited
convergence guarantees. This is in sharp contrast to classical model-based
control, which has a rich theory but typically requires strong modeling
assumptions. In this paper, we combine the two approaches to achieve the best
of both worlds. We consider a dynamical system with both linear and non-linear
components and develop a novel approach to use the linear model to define a
warm start for a model-free, policy gradient method. We show this hybrid
approach outperforms the model-based controller while avoiding the convergence
issues associated with model-free approaches via both numerical experiments and
theoretical analyses, in which we derive sufficient conditions on the
non-linear component such that our approach is guaranteed to converge to the
(nearly) global optimal controller
Distributed Reinforcement Learning in Multi-Agent Networked Systems
We study distributed reinforcement learning (RL) for a network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are local, e.g., between neighbors. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies are non-local and provide a finite-time error bound that shows how the convergence rate depends on the depth of the dependencies in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation that apply beyond the setting of RL in networked systems
Distributed Reinforcement Learning in Multi-Agent Networked Systems
We study distributed reinforcement learning (RL) for a network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are local, e.g., between neighbors. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies are non-local and provide a finite-time error bound that shows how the convergence rate depends on the depth of the dependencies in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation that apply beyond the setting of RL in networked systems
Gym-ANM: Reinforcement Learning Environments for Active Network Management Tasks in Electricity Distribution Systems
Active network management (ANM) of electricity distribution networks include
many complex stochastic sequential optimization problems. These problems need
to be solved for integrating renewable energies and distributed storage into
future electrical grids. In this work, we introduce Gym-ANM, a framework for
designing reinforcement learning (RL) environments that model ANM tasks in
electricity distribution networks. These environments provide new playgrounds
for RL research in the management of electricity networks that do not require
an extensive knowledge of the underlying dynamics of such systems. Along with
this work, we are releasing an implementation of an introductory
toy-environment, ANM6-Easy, designed to emphasize common challenges in ANM. We
also show that state-of-the-art RL algorithms can already achieve good
performance on ANM6-Easy when compared against a model predictive control (MPC)
approach. Finally, we provide guidelines to create new Gym-ANM environments
differing in terms of (a) the distribution network topology and parameters, (b)
the observation space, (c) the modelling of the stochastic processes present in
the system, and (d) a set of hyperparameters influencing the reward signal.
Gym-ANM can be downloaded at https://github.com/robinhenry/gym-anm.Comment: 15 main pages, 17 pages of appendix, 10 figures, GitHub repository:
https://github.com/robinhenry/gym-an
Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
It has long been recognized that multi-agent reinforcement learning (MARL)
faces significant scalability issues due to the fact that the size of the state
and action spaces are exponentially large in the number of agents. In this
paper, we identify a rich class of networked MARL problems where the model
exhibits a local dependence structure that allows it to be solved in a scalable
manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can
learn a near optimal localized policy for optimizing the average reward with
complexity scaling with the state-action space size of local neighborhoods, as
opposed to the entire network. Our result centers around identifying and
exploiting an exponential decay property that ensures the effect of agents on
each other decays exponentially fast in their graph distance