7,886 research outputs found

    DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation

    Full text link
    We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowd-sourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space.Comment: Neural Information Processing Systems (NIPS 2018) Deep Reinforcement Learning Worksho

    Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning

    Full text link
    Mobility in an effective and socially-compliant manner is an essential yet challenging task for robots operating in crowded spaces. Recent works have shown the power of deep reinforcement learning techniques to learn socially cooperative policies. However, their cooperation ability deteriorates as the crowd grows since they typically relax the problem as a one-way Human-Robot interaction problem. In this work, we want to go beyond first-order Human-Robot interaction and more explicitly model Crowd-Robot Interaction (CRI). We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework. Our model captures the Human-Human interactions occurring in dense crowds that indirectly affects the robot's anticipation capability. Our proposed attentive pooling mechanism learns the collective importance of neighboring humans with respect to their future states. Various experiments demonstrate that our model can anticipate human dynamics and navigate in crowds with time efficiency, outperforming state-of-the-art methods.Comment: Accepted at ICRA2019. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios

    Full text link
    In this paper, we present a decentralized sensor-level collision avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent's steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy's robustness and effectiveness. We validate the learned sensor-level collision avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller's robustness against the sim-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution to the safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. Videos are available at https://sites.google.com/view/hybridmrc

    L2B: Learning to Balance the Safety-Efficiency Trade-off in Interactive Crowd-aware Robot Navigation

    Full text link
    This work presents a deep reinforcement learning framework for interactive navigation in a crowded place. Our proposed approach, Learning to Balance (L2B) framework enables mobile robot agents to steer safely towards their destinations by avoiding collisions with a crowd, while actively clearing a path by asking nearby pedestrians to make room, if necessary, to keep their travel efficient. We observe that the safety and efficiency requirements in crowd-aware navigation have a trade-off in the presence of social dilemmas between the agent and the crowd. On the one hand, intervening in pedestrian paths too much to achieve instant efficiency will result in collapsing a natural crowd flow and may eventually put everyone, including the self, at risk of collisions. On the other hand, keeping in silence to avoid every single collision will lead to the agent's inefficient travel. With this observation, our L2B framework augments the reward function used in learning an interactive navigation policy to penalize frequent active path clearing and passive collision avoidance, which substantially improves the balance of the safety-efficiency trade-off. We evaluate our L2B framework in a challenging crowd simulation and demonstrate its superiority, in terms of both navigation success and collision rate, over a state-of-the-art navigation approach.Comment: Accepted at IROS2020. Project site: https://denkiwakame.github.io/l2b

    Realtime Collision Avoidance for Mobile Robots in Dense Crowds using Implicit Multi-sensor Fusion and Deep Reinforcement Learning

    Full text link
    We present a novel learning-based collision avoidance algorithm, CrowdSteer, for mobile robots operating in dense and crowded environments. Our approach is end-to-end and uses multiple perception sensors such as a 2-D lidar along with a depth camera to sense surrounding dynamic agents and compute collision-free velocities. Our training approach is based on the sim-to-real paradigm and uses high fidelity 3-D simulations of pedestrians and the environment to train a policy using Proximal Policy Optimization (PPO). We show that our learned navigation model is directly transferable to previously unseen virtual and dense real-world environments. We have integrated our algorithm with differential drive robots and evaluated its performance in narrow scenarios such as dense crowds, narrow corridors, T-junctions, L-junctions, etc. In practice, our approach can perform real-time collision avoidance and generate smooth trajectories in such complex scenarios. We also compare the performance with prior methods based on metrics such as trajectory length, mean time to goal, success rate, and smoothness and observe considerable improvement.Comment: 8 pages, 7 figure

    Getting Robots Unfrozen and Unlost in Dense Pedestrian Crowds

    Full text link
    We aim to enable a mobile robot to navigate through environments with dense crowds, e.g., shopping malls, canteens, train stations, or airport terminals. In these challenging environments, existing approaches suffer from two common problems: the robot may get frozen and cannot make any progress toward its goal, or it may get lost due to severe occlusions inside a crowd. Here we propose a navigation framework that handles the robot freezing and the navigation lost problems simultaneously. First, we enhance the robot's mobility and unfreeze the robot in the crowd using a reinforcement learning based local navigation policy developed in our previous work~\cite{long2017towards}, which naturally takes into account the coordination between the robot and the human. Secondly, the robot takes advantage of its excellent local mobility to recover from its localization failure. In particular, it dynamically chooses to approach a set of recovery positions with rich features. To the best of our knowledge, our method is the first approach that simultaneously solves the freezing problem and the navigation lost problem in dense crowds. We evaluate our method in both simulated and real-world environments and demonstrate that it outperforms the state-of-the-art approaches. Videos are available at https://sites.google.com/view/rlslam

    User Modeling for Task Oriented Dialogues

    Full text link
    We introduce end-to-end neural network based models for simulating users of task-oriented dialogue systems. User simulation in dialogue systems is crucial from two different perspectives: (i) automatic evaluation of different dialogue models, and (ii) training task-oriented dialogue systems. We design a hierarchical sequence-to-sequence model that first encodes the initial user goal and system turns into fixed length representations using Recurrent Neural Networks (RNN). It then encodes the dialogue history using another RNN layer. At each turn, user responses are decoded from the hidden representations of the dialogue level RNN. This hierarchical user simulator (HUS) approach allows the model to capture undiscovered parts of the user goal without the need of an explicit dialogue state tracking. We further develop several variants by utilizing a latent variable model to inject random variations into user responses to promote diversity in simulated user responses and a novel goal regularization mechanism to penalize divergence of user responses from the initial user goal. We evaluate the proposed models on movie ticket booking domain by systematically interacting each user simulator with various dialogue system policies trained with different objectives and users.Comment: Accepted at SLT 201

    LeTS-Drive: Driving in a Crowd by Learning from Tree Search

    Full text link
    Autonomous driving in a crowded environment, e.g., a busy traffic intersection, is an unsolved challenge for robotics. The robot vehicle must contend with a dynamic and partially observable environment, noisy sensors, and many agents. A principled approach is to formalize it as a Partially Observable Markov Decision Process (POMDP) and solve it through online belief-tree search. To handle a large crowd and achieve real-time performance in this very challenging setting, we propose LeTS-Drive, which integrates online POMDP planning and deep learning. It consists of two phases. In the offline phase, we learn a policy and the corresponding value function by imitating the belief tree search. In the online phase, the learned policy and value function guide the belief tree search. LeTS-Drive leverages the robustness of planning and the runtime efficiency of learning to enhance the performance of both. Experimental results in simulation show that LeTS-Drive outperforms either planning or imitation learning alone and develops sophisticated driving skills

    IntelligentCrowd: Mobile Crowdsensing via Multi-agent Reinforcement Learning

    Full text link
    The prosperity of smart mobile devices has made mobile crowdsensing (MCS) a promising paradigm for completing complex sensing and computation tasks. In the past, great efforts have been made on the design of incentive mechanisms and task allocation strategies from MCS platform's perspective to motivate mobile users' participation. However, in practice, MCS participants face many uncertainties coming from their sensing environment as well as other participants' strategies, and how do they interact with each other and make sensing decisions is not well understood. In this paper, we take MCS participants' perspective to derive an online sensing policy to maximize their payoffs via MCS participation. Specifically, we model the interactions of mobile users and sensing environments as a multi-agent Markov decision process. Each participant cannot observe others' decisions, but needs to decide her effort level in sensing tasks only based on local information, e.g., its own record of sensed signals' quality. To cope with the stochastic sensing environment, we develop an intelligent crowdsensing algorithm IntelligentCrowd by leveraging the power of multi-agent reinforcement learning (MARL). Our algorithm leads to the optimal sensing policy for each user to maximize the expected payoff against stochastic sensing environments, and can be implemented at individual participant's level in a distributed fashion. Numerical simulations demonstrate that IntelligentCrowd significantly improves users' payoffs in sequential MCS tasks under various sensing dynamics.Comment: In Submissio

    Emotional Contagion-Aware Deep Reinforcement Learning for Antagonistic Crowd Simulation

    Full text link
    The antagonistic behavior in the crowd usually exacerbates the seriousness of the situation in sudden riots, where the antagonistic emotional contagion and behavioral decision making play very important roles. However, the complex mechanism of antagonistic emotion influencing decision making, especially in the environment of sudden confrontation, has not yet been explored very clearly. In this paper, we propose an Emotional contagion-aware Deep reinforcement learning model for Antagonistic Crowd Simulation (ACSED). Firstly, we build a group emotional contagion module based on the improved Susceptible Infected Susceptible (SIS) infection disease model, and estimate the emotional state of the group at each time step during the simulation. Then, the tendency of crowd antagonistic action is estimated based on Deep Q Network (DQN), where the agent learns the action autonomously, and leverages the mean field theory to quickly calculate the influence of other surrounding individuals on the central one. Finally, the rationality of the predicted actions by DQN is further analyzed in combination with group emotion, and the final action of the agent is determined. The proposed method in this paper is verified through several experiments with different settings. The results prove that the antagonistic emotion has a vital impact on the group combat, and positive emotional states are more conducive to combat. Moreover, by comparing the simulation results with real scenes, the feasibility of our method is further confirmed, which can provide good reference to formulate battle plans and improve the win rate of righteous groups in a variety of situations.Comment: 14 pages, 9 figure
    • …
    corecore