4,116 research outputs found
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
GAMES: A new Scenario for Software and Knowledge Reuse
Games are a well-known test bed for testing search algorithms and learning methods, and many authors have presented numerous reasons for the research in this area. Nevertheless, they have not received the attention they deserve as software projects.
In this paper, we analyze the applicability of software
and knowledge reuse in the games domain. In spite of the
need to find a good evaluation function, search algorithms
and interface design can be said to be the primary concerns.
In addition, we will discuss the current state of the main
statistical learning methods and how they can be addressed
from a software engineering point of view. So, this paper
proposes a reliable environment and adequate tools, necessary in order to achieve high levels of reuse in the games domain
LeTS-Drive: Driving in a Crowd by Learning from Tree Search
Autonomous driving in a crowded environment, e.g., a busy traffic
intersection, is an unsolved challenge for robotics. The robot vehicle must
contend with a dynamic and partially observable environment, noisy sensors, and
many agents. A principled approach is to formalize it as a Partially Observable
Markov Decision Process (POMDP) and solve it through online belief-tree search.
To handle a large crowd and achieve real-time performance in this very
challenging setting, we propose LeTS-Drive, which integrates online POMDP
planning and deep learning. It consists of two phases. In the offline phase, we
learn a policy and the corresponding value function by imitating the belief
tree search. In the online phase, the learned policy and value function guide
the belief tree search. LeTS-Drive leverages the robustness of planning and the
runtime efficiency of learning to enhance the performance of both. Experimental
results in simulation show that LeTS-Drive outperforms either planning or
imitation learning alone and develops sophisticated driving skills
Learning to Engage with Interactive Systems: A Field Study on Deep Reinforcement Learning in a Public Museum
Physical agents that can autonomously generate engaging, life-like behaviour
will lead to more responsive and interesting robots and other autonomous
systems. Although many advances have been made for one-to-one interactions in
well controlled settings, future physical agents should be capable of
interacting with humans in natural settings, including group interaction. In
order to generate engaging behaviours, the autonomous system must first be able
to estimate its human partners' engagement level. In this paper, we propose an
approach for estimating engagement during group interaction by simultaneously
taking into account active and passive interaction, i.e. occupancy, and use the
measure as the reward signal within a reinforcement learning framework to learn
engaging interactive behaviours. The proposed approach is implemented in an
interactive sculptural system in a museum setting. We compare the learning
system to a baseline using pre-scripted interactive behaviours. Analysis based
on sensory data and survey data shows that adaptable behaviours within an
expert-designed action space can achieve higher engagement and likeability.Comment: 29 pages, 19 figures, under revie
Constrained Exploration and Recovery from Experience Shaping
We consider the problem of reinforcement learning under safety requirements,
in which an agent is trained to complete a given task, typically formalized as
the maximization of a reward signal over time, while concurrently avoiding
undesirable actions or states, associated to lower rewards, or penalties. The
construction and balancing of different reward components can be difficult in
the presence of multiple objectives, yet is crucial for producing a satisfying
policy. For example, in reaching a target while avoiding obstacles, low
collision penalties can lead to reckless movements while high penalties can
discourage exploration. To circumvent this limitation, we examine the effect of
past actions in terms of safety to estimate which are acceptable or should be
avoided in the future. We then actively reshape the action space of the agent
during reinforcement learning, so that reward-driven exploration is constrained
within safety limits. We propose an algorithm enabling the learning of such
safety constraints in parallel with reinforcement learning and demonstrate its
effectiveness in terms of both task completion and training time.Comment: Code: https://github.com/IBM/constrained-r
Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Powered Descent and Landing
Future Mars missions will require advanced guidance, navigation, and control
algorithms for the powered descent phase to target specific surface locations
and achieve pinpoint accuracy (landing error ellipse 5 m radius). The
latter requires both a navigation system capable of estimating the lander's
state in real-time and a guidance and control system that can map the estimated
lander state to a commanded thrust for each lander engine. In this paper, we
present a novel integrated guidance and control algorithm designed by applying
the principles of reinforcement learning theory. The latter is used to learn a
policy mapping the lander's estimated state directly to a commanded thrust for
each engine, with the policy resulting in accurate and fuel-efficient
trajectories. Specifically, we use proximal policy optimization, a policy
gradient method, to learn the policy. Another contribution of this paper is the
use of different discount rates for terminal and shaping rewards, which
significantly enhances optimization performance. We present simulation results
demonstrating the guidance and control system's performance in a 6-DOF
simulation environment and demonstrate robustness to noise and system parameter
uncertainty.Comment: 37 page
Reinforcement Evolutionary Learning Method for self-learning
In statistical modelling the biggest threat is concept drift which makes the
model gradually showing deteriorating performance over time. There are state of
the art methodologies to detect the impact of concept drift, however general
strategy considered to overcome the issue in performance is to rebuild or
re-calibrate the model periodically as the variable patterns for the model
changes significantly due to market change or consumer behavior change etc.
Quantitative research is the most widely spread application of data science in
Marketing or financial domain where applicability of state of the art
reinforcement learning for auto-learning is less explored paradigm.
Reinforcement learning is heavily dependent on having a simulated environment
which is majorly available for gaming or online systems, to learn from the live
feedback. However, there are some research happened on the area of online
advertisement, pricing etc where due to the nature of the online learning
environment scope of reinforcement learning is explored. Our proposed solution
is a reinforcement learning based, true self-learning algorithm which can adapt
to the data change or concept drift and auto learn and self-calibrate for the
new patterns of the data solving the problem of concept drift.
Keywords - Reinforcement learning, Genetic Algorithm, Q-learning,
Classification modelling, CMA-ES, NES, Multi objective optimization, Concept
drift, Population stability index, Incremental learning, F1-measure, Predictive
Modelling, Self-learning, MCTS, AlphaGo, AlphaZeroComment: 5 figure
LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations
Reinforcement learning approaches have long appealed to the data management
community due to their ability to learn to control dynamic behavior from raw
system performance. Recent successes in combining deep neural networks with
reinforcement learning have sparked significant new interest in this domain.
However, practical solutions remain elusive due to large training data
requirements, algorithmic instability, and lack of standard tools. In this
work, we introduce LIFT, an end-to-end software stack for applying deep
reinforcement learning to data management tasks. While prior work has
frequently explored applications in simulations, LIFT centers on utilizing
human expertise to learn from demonstrations, thus lowering online training
times. We further introduce TensorForce, a TensorFlow library for applied deep
reinforcement learning exposing a unified declarative interface to common RL
algorithms, thus providing a backend to LIFT. We demonstrate the utility of
LIFT in two case studies in database compound indexing and resource management
in stream processing. Results show LIFT controllers initialized from
demonstrations can outperform human baselines and heuristics across latency
metrics and space usage by up to 70%
Routing Networks and the Challenges of Modular and Compositional Computation
Compositionality is a key strategy for addressing combinatorial complexity
and the curse of dimensionality. Recent work has shown that compositional
solutions can be learned and offer substantial gains across a variety of
domains, including multi-task learning, language modeling, visual question
answering, machine comprehension, and others. However, such models present
unique challenges during training when both the module parameters and their
composition must be learned jointly. In this paper, we identify several of
these issues and analyze their underlying causes. Our discussion focuses on
routing networks, a general approach to this problem, and examines empirically
the interplay of these challenges and a variety of design decisions. In
particular, we consider the effect of how the algorithm decides on module
composition, how the algorithm updates the modules, and if the algorithm uses
regularization
Annotated bibliography of Software Engineering Laboratory literature
An annotated bibliography of technical papers, documents, and memorandums produced by or related to the Software Engineering Laboratory is given. More than 100 publications are summarized. These publications cover many areas of software engineering and range from research reports to software documentation. All materials have been grouped into eight general subject areas for easy reference: The Software Engineering Laboratory; The Software Engineering Laboratory: Software Development Documents; Software Tools; Software Models; Software Measurement; Technology Evaluations; Ada Technology; and Data Collection. Subject and author indexes further classify these documents by specific topic and individual author
- …