7,886 research outputs found
DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation
We present a traffic simulation named DeepTraffic where the planning systems
for a subset of the vehicles are handled by a neural network as part of a
model-free, off-policy reinforcement learning process. The primary goal of
DeepTraffic is to make the hands-on study of deep reinforcement learning
accessible to thousands of students, educators, and researchers in order to
inspire and fuel the exploration and evaluation of deep Q-learning network
variants and hyperparameter configurations through large-scale, open
competition. This paper investigates the crowd-sourced hyperparameter tuning of
the policy network that resulted from the first iteration of the DeepTraffic
competition where thousands of participants actively searched through the
hyperparameter space.Comment: Neural Information Processing Systems (NIPS 2018) Deep Reinforcement
Learning Worksho
Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Mobility in an effective and socially-compliant manner is an essential yet
challenging task for robots operating in crowded spaces. Recent works have
shown the power of deep reinforcement learning techniques to learn socially
cooperative policies. However, their cooperation ability deteriorates as the
crowd grows since they typically relax the problem as a one-way Human-Robot
interaction problem. In this work, we want to go beyond first-order Human-Robot
interaction and more explicitly model Crowd-Robot Interaction (CRI). We propose
to (i) rethink pairwise interactions with a self-attention mechanism, and (ii)
jointly model Human-Robot as well as Human-Human interactions in the deep
reinforcement learning framework. Our model captures the Human-Human
interactions occurring in dense crowds that indirectly affects the robot's
anticipation capability. Our proposed attentive pooling mechanism learns the
collective importance of neighboring humans with respect to their future
states. Various experiments demonstrate that our model can anticipate human
dynamics and navigate in crowds with time efficiency, outperforming
state-of-the-art methods.Comment: Accepted at ICRA2019. Copyright may be transferred without notice,
after which this version may no longer be accessibl
Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios
In this paper, we present a decentralized sensor-level collision avoidance
policy for multi-robot systems, which shows promising results in practical
applications. In particular, our policy directly maps raw sensor measurements
to an agent's steering commands in terms of the movement velocity. As a first
step toward reducing the performance gap between decentralized and centralized
methods, we present a multi-scenario multi-stage training framework to learn an
optimal policy. The policy is trained over a large number of robots in rich,
complex environments simultaneously using a policy gradient based reinforcement
learning algorithm. The learning algorithm is also integrated into a hybrid
control framework to further improve the policy's robustness and effectiveness.
We validate the learned sensor-level collision avoidance policy in a variety
of simulated and real-world scenarios with thorough performance evaluations for
large-scale multi-robot systems. The generalization of the learned policy is
verified in a set of unseen scenarios including the navigation of a group of
heterogeneous robots and a large-scale scenario with 100 robots. Although the
policy is trained using simulation data only, we have successfully deployed it
on physical robots with shapes and dynamics characteristics that are different
from the simulated agents, in order to demonstrate the controller's robustness
against the sim-to-real modeling error. Finally, we show that the
collision-avoidance policy learned from multi-robot navigation tasks provides
an excellent solution to the safe and effective autonomous navigation for a
single robot working in a dense real human crowd. Our learned policy enables a
robot to make effective progress in a crowd without getting stuck. Videos are
available at https://sites.google.com/view/hybridmrc
L2B: Learning to Balance the Safety-Efficiency Trade-off in Interactive Crowd-aware Robot Navigation
This work presents a deep reinforcement learning framework for interactive
navigation in a crowded place. Our proposed approach, Learning to Balance (L2B)
framework enables mobile robot agents to steer safely towards their
destinations by avoiding collisions with a crowd, while actively clearing a
path by asking nearby pedestrians to make room, if necessary, to keep their
travel efficient. We observe that the safety and efficiency requirements in
crowd-aware navigation have a trade-off in the presence of social dilemmas
between the agent and the crowd. On the one hand, intervening in pedestrian
paths too much to achieve instant efficiency will result in collapsing a
natural crowd flow and may eventually put everyone, including the self, at risk
of collisions. On the other hand, keeping in silence to avoid every single
collision will lead to the agent's inefficient travel. With this observation,
our L2B framework augments the reward function used in learning an interactive
navigation policy to penalize frequent active path clearing and passive
collision avoidance, which substantially improves the balance of the
safety-efficiency trade-off. We evaluate our L2B framework in a challenging
crowd simulation and demonstrate its superiority, in terms of both navigation
success and collision rate, over a state-of-the-art navigation approach.Comment: Accepted at IROS2020. Project site:
https://denkiwakame.github.io/l2b
Realtime Collision Avoidance for Mobile Robots in Dense Crowds using Implicit Multi-sensor Fusion and Deep Reinforcement Learning
We present a novel learning-based collision avoidance algorithm, CrowdSteer,
for mobile robots operating in dense and crowded environments. Our approach is
end-to-end and uses multiple perception sensors such as a 2-D lidar along with
a depth camera to sense surrounding dynamic agents and compute collision-free
velocities. Our training approach is based on the sim-to-real paradigm and uses
high fidelity 3-D simulations of pedestrians and the environment to train a
policy using Proximal Policy Optimization (PPO). We show that our learned
navigation model is directly transferable to previously unseen virtual and
dense real-world environments. We have integrated our algorithm with
differential drive robots and evaluated its performance in narrow scenarios
such as dense crowds, narrow corridors, T-junctions, L-junctions, etc. In
practice, our approach can perform real-time collision avoidance and generate
smooth trajectories in such complex scenarios. We also compare the performance
with prior methods based on metrics such as trajectory length, mean time to
goal, success rate, and smoothness and observe considerable improvement.Comment: 8 pages, 7 figure
Getting Robots Unfrozen and Unlost in Dense Pedestrian Crowds
We aim to enable a mobile robot to navigate through environments with dense
crowds, e.g., shopping malls, canteens, train stations, or airport terminals.
In these challenging environments, existing approaches suffer from two common
problems: the robot may get frozen and cannot make any progress toward its
goal, or it may get lost due to severe occlusions inside a crowd. Here we
propose a navigation framework that handles the robot freezing and the
navigation lost problems simultaneously. First, we enhance the robot's mobility
and unfreeze the robot in the crowd using a reinforcement learning based local
navigation policy developed in our previous work~\cite{long2017towards}, which
naturally takes into account the coordination between the robot and the human.
Secondly, the robot takes advantage of its excellent local mobility to recover
from its localization failure. In particular, it dynamically chooses to
approach a set of recovery positions with rich features. To the best of our
knowledge, our method is the first approach that simultaneously solves the
freezing problem and the navigation lost problem in dense crowds. We evaluate
our method in both simulated and real-world environments and demonstrate that
it outperforms the state-of-the-art approaches. Videos are available at
https://sites.google.com/view/rlslam
User Modeling for Task Oriented Dialogues
We introduce end-to-end neural network based models for simulating users of
task-oriented dialogue systems. User simulation in dialogue systems is crucial
from two different perspectives: (i) automatic evaluation of different dialogue
models, and (ii) training task-oriented dialogue systems. We design a
hierarchical sequence-to-sequence model that first encodes the initial user
goal and system turns into fixed length representations using Recurrent Neural
Networks (RNN). It then encodes the dialogue history using another RNN layer.
At each turn, user responses are decoded from the hidden representations of the
dialogue level RNN. This hierarchical user simulator (HUS) approach allows the
model to capture undiscovered parts of the user goal without the need of an
explicit dialogue state tracking. We further develop several variants by
utilizing a latent variable model to inject random variations into user
responses to promote diversity in simulated user responses and a novel goal
regularization mechanism to penalize divergence of user responses from the
initial user goal. We evaluate the proposed models on movie ticket booking
domain by systematically interacting each user simulator with various dialogue
system policies trained with different objectives and users.Comment: Accepted at SLT 201
LeTS-Drive: Driving in a Crowd by Learning from Tree Search
Autonomous driving in a crowded environment, e.g., a busy traffic
intersection, is an unsolved challenge for robotics. The robot vehicle must
contend with a dynamic and partially observable environment, noisy sensors, and
many agents. A principled approach is to formalize it as a Partially Observable
Markov Decision Process (POMDP) and solve it through online belief-tree search.
To handle a large crowd and achieve real-time performance in this very
challenging setting, we propose LeTS-Drive, which integrates online POMDP
planning and deep learning. It consists of two phases. In the offline phase, we
learn a policy and the corresponding value function by imitating the belief
tree search. In the online phase, the learned policy and value function guide
the belief tree search. LeTS-Drive leverages the robustness of planning and the
runtime efficiency of learning to enhance the performance of both. Experimental
results in simulation show that LeTS-Drive outperforms either planning or
imitation learning alone and develops sophisticated driving skills
IntelligentCrowd: Mobile Crowdsensing via Multi-agent Reinforcement Learning
The prosperity of smart mobile devices has made mobile crowdsensing (MCS) a
promising paradigm for completing complex sensing and computation tasks. In the
past, great efforts have been made on the design of incentive mechanisms and
task allocation strategies from MCS platform's perspective to motivate mobile
users' participation. However, in practice, MCS participants face many
uncertainties coming from their sensing environment as well as other
participants' strategies, and how do they interact with each other and make
sensing decisions is not well understood. In this paper, we take MCS
participants' perspective to derive an online sensing policy to maximize their
payoffs via MCS participation. Specifically, we model the interactions of
mobile users and sensing environments as a multi-agent Markov decision process.
Each participant cannot observe others' decisions, but needs to decide her
effort level in sensing tasks only based on local information, e.g., its own
record of sensed signals' quality. To cope with the stochastic sensing
environment, we develop an intelligent crowdsensing algorithm IntelligentCrowd
by leveraging the power of multi-agent reinforcement learning (MARL). Our
algorithm leads to the optimal sensing policy for each user to maximize the
expected payoff against stochastic sensing environments, and can be implemented
at individual participant's level in a distributed fashion. Numerical
simulations demonstrate that IntelligentCrowd significantly improves users'
payoffs in sequential MCS tasks under various sensing dynamics.Comment: In Submissio
Emotional Contagion-Aware Deep Reinforcement Learning for Antagonistic Crowd Simulation
The antagonistic behavior in the crowd usually exacerbates the seriousness of
the situation in sudden riots, where the antagonistic emotional contagion and
behavioral decision making play very important roles. However, the complex
mechanism of antagonistic emotion influencing decision making, especially in
the environment of sudden confrontation, has not yet been explored very
clearly. In this paper, we propose an Emotional contagion-aware Deep
reinforcement learning model for Antagonistic Crowd Simulation (ACSED).
Firstly, we build a group emotional contagion module based on the improved
Susceptible Infected Susceptible (SIS) infection disease model, and estimate
the emotional state of the group at each time step during the simulation. Then,
the tendency of crowd antagonistic action is estimated based on Deep Q Network
(DQN), where the agent learns the action autonomously, and leverages the mean
field theory to quickly calculate the influence of other surrounding
individuals on the central one. Finally, the rationality of the predicted
actions by DQN is further analyzed in combination with group emotion, and the
final action of the agent is determined. The proposed method in this paper is
verified through several experiments with different settings. The results prove
that the antagonistic emotion has a vital impact on the group combat, and
positive emotional states are more conducive to combat. Moreover, by comparing
the simulation results with real scenes, the feasibility of our method is
further confirmed, which can provide good reference to formulate battle plans
and improve the win rate of righteous groups in a variety of situations.Comment: 14 pages, 9 figure
- …