100,207 research outputs found
Incorporating Behavioral Constraints in Online AI Systems
AI systems that learn through reward feedback about the actions they take are
increasingly deployed in domains that have significant impact on our daily
life. However, in many cases the online rewards should not be the only guiding
criteria, as there are additional constraints and/or priorities imposed by
regulations, values, preferences, or ethical principles. We detail a novel
online agent that learns a set of behavioral constraints by observation and
uses these learned constraints as a guide when making decisions in an online
setting while still being reactive to reward feedback. To define this agent, we
propose to adopt a novel extension to the classical contextual multi-armed
bandit setting and we provide a new algorithm called Behavior Constrained
Thompson Sampling (BCTS) that allows for online learning while obeying
exogenous constraints. Our agent learns a constrained policy that implements
the observed behavioral constraints demonstrated by a teacher agent, and then
uses this constrained policy to guide the reward-based online exploration and
exploitation. We characterize the upper bound on the expected regret of the
contextual bandit algorithm that underlies our agent and provide a case study
with real world data in two application domains. Our experiments show that the
designed agent is able to act within the set of behavior constraints without
significantly degrading its overall reward performance.Comment: 9 pages, 6 figure
Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions
The focus of this paper is on solving multi-robot planning problems in
continuous spaces with partial observability. Decentralized partially
observable Markov decision processes (Dec-POMDPs) are general models for
multi-robot coordination problems, but representing and solving Dec-POMDPs is
often intractable for large problems. To allow for a high-level representation
that is natural for multi-robot problems and scalable to large discrete and
continuous problems, this paper extends the Dec-POMDP model to the
decentralized partially observable semi-Markov decision process (Dec-POSMDP).
The Dec-POSMDP formulation allows asynchronous decision-making by the robots,
which is crucial in multi-robot domains. We also present an algorithm for
solving this Dec-POSMDP which is much more scalable than previous methods since
it can incorporate closed-loop belief space macro-actions in planning. These
macro-actions are automatically constructed to produce robust solutions. The
proposed method's performance is evaluated on a complex multi-robot package
delivery problem under uncertainty, showing that our approach can naturally
represent multi-robot problems and provide high-quality solutions for
large-scale problems
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Drawing an inspiration from behavioral studies of human decision making, we
propose here a general parametric framework for multi-armed bandit problem,
which extends the standard Thompson Sampling approach to incorporate reward
processing biases associated with several neurological and psychiatric
conditions, including Parkinson's and Alzheimer's diseases,
attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
We demonstrate empirically that the proposed parametric approach can often
outperform the baseline Thompson Sampling on a variety of datasets. Moreover,
from the behavioral modeling perspective, our parametric framework can be
viewed as a first step towards a unifying computational model capturing reward
processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1
Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving
Behavior and motion planning play an important role in automated driving.
Traditionally, behavior planners instruct local motion planners with predefined
behaviors. Due to the high scene complexity in urban environments,
unpredictable situations may occur in which behavior planners fail to match
predefined behavior templates. Recently, general-purpose planners have been
introduced, combining behavior and local motion planning. These general-purpose
planners allow behavior-aware motion planning given a single reward function.
However, two challenges arise: First, this function has to map a complex
feature space into rewards. Second, the reward function has to be manually
tuned by an expert. Manually tuning this reward function becomes a tedious
task. In this paper, we propose an approach that relies on human driving
demonstrations to automatically tune reward functions. This study offers
important insights into the driving style optimization of general-purpose
planners with maximum entropy inverse reinforcement learning. We evaluate our
approach based on the expected value difference between learned and
demonstrated policies. Furthermore, we compare the similarity of human driven
trajectories with optimal policies of our planner under learned and
expert-tuned reward functions. Our experiments show that we are able to learn
reward functions exceeding the level of manual expert tuning without prior
domain knowledge.Comment: Appeared at IROS 2019. Accepted version. Added/updated footnote,
minor correction in preliminarie
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Accelerating Cooperative Planning for Automated Vehicles with Learned Heuristics and Monte Carlo Tree Search
Efficient driving in urban traffic scenarios requires foresight. The
observation of other traffic participants and the inference of their possible
next actions depending on the own action is considered cooperative prediction
and planning. Humans are well equipped with the capability to predict the
actions of multiple interacting traffic participants and plan accordingly,
without the need to directly communicate with others. Prior work has shown that
it is possible to achieve effective cooperative planning without the need for
explicit communication. However, the search space for cooperative plans is so
large that most of the computational budget is spent on exploring the search
space in unpromising regions that are far away from the solution. To accelerate
the planning process, we combined learned heuristics with a cooperative
planning method to guide the search towards regions with promising actions,
yielding better solutions at lower computational costs
Translating Neuralese
Several approaches have recently been proposed for learning decentralized
deep multiagent policies that coordinate via a differentiable communication
channel. While these policies are effective for many tasks, interpretation of
their induced communication strategies has remained a challenge. Here we
propose to interpret agents' messages by translating them. Unlike in typical
machine translation problems, we have no parallel data to learn from. Instead
we develop a translation model based on the insight that agent messages and
natural language strings mean the same thing if they induce the same belief
about the world in a listener. We present theoretical guarantees and empirical
evidence that our approach preserves both the semantics and pragmatics of
messages by ensuring that players communicating through a translation layer do
not suffer a substantial loss in reward relative to players with a common
language.Comment: Fixes typos and cleans ups some model presentation detail
- …