305 research outputs found

    Driving in Dense Traffic with Model-Free Reinforcement Learning

    Full text link
    Traditional planning and control methods could fail to find a feasible trajectory for an autonomous vehicle to execute amongst dense traffic on roads. This is because the obstacle-free volume in spacetime is very small in these scenarios for the vehicle to drive through. However, that does not mean the task is infeasible since human drivers are known to be able to drive amongst dense traffic by leveraging the cooperativeness of other drivers to open a gap. The traditional methods fail to take into account the fact that the actions taken by an agent affect the behaviour of other vehicles on the road. In this work, we rely on the ability of deep reinforcement learning to implicitly model such interactions and learn a continuous control policy over the action space of an autonomous vehicle. The application we consider requires our agent to negotiate and open a gap in the road in order to successfully merge or change lanes. Our policy learns to repeatedly probe into the target road lane while trying to find a safe spot to move in to. We compare against two model-predictive control-based algorithms and show that our policy outperforms them in simulation.Comment: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2020. Updated Github repository link

    Sequential decision making in artificial musical intelligence

    Get PDF
    Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science

    Large-Scale Unmanned Aerial Systems Traffic Density Prediction and Management

    Get PDF
    In recent years, the applications of Unmanned Aerial Systems (UAS) has become more and more popular. We envision that in the near future, the complicated and high density UAS traffic will impose significant burden to air traffic management. Lot of works focus on the application development of individual Small Unmanned Aerial Systems (sUAS) or sUAS management Policy, however, the study of the UAS cluster behaviors such as forecasting and managing of the UAS traffic has generally not been addressed. In order to address the above issue, there is an urgent need to investigate three research directions. The first direction is to develop a high fidelity simulator for the UAS cluster behavior evaluation. The second direction to study real time trajectory planning algorithms to mitigate the high dense UAS traffic. The last direction is to investigate techniques that rapidly and accurately forecast the UAS traffic pattern in the future. In this thesis, we elaborate these three research topics and present a universal paradigm to predict and manage the traffic for the large-scale unmanned aerial systems. To enable the research in UAS traffic management and prediction, a Java based Multi-Agent Air Traffic and Resource Usage Simulation (MATRUS) framework is first developed. We use two types of UAS trajectories, Point-to-Point (P2P) and Man- hattan, as the case study to describe the capability of presented framework. Various communication and propagation models (i.e. log-distance-path loss) can be integrated with the framework to model the communication between UASs and base stations. The results show that MATRUS has the ability to evaluate different sUAS traffic management policies, and can provide insights on the relationships between air traf- fic and communication resource usage for further studies. Moreover, the framework can be extended to study the effect of sUAS Detect-and-Avoid (DAA) mechanisms, implement additional traffic management policies, and handle more complex traffic demands and geographical distributions. Based on the MATRUS framework, we propose a Sparse Represented Temporal- Spatial (SRTS) UAS trajectory planning algorithm. The SRTS algorithm allows the sUAS to avoid static no-fly areas (i.e. static obstacles) or other areas that have congested air traffic or communication traffic. The core functionality of the routing algorithm supports the instant refresh of the in-flight environment making it appropri- ate for highly dynamic air traffic scenarios. The characterization of the routing time and memory usage demonstrate that the SRTS algorithm outperforms a traditional Temporal-Spatial routing algorithm. The deep learning based approach has shown an outstanding success in many areas, we first investigated the possibility of applying the deep neural network in predicting the trajectory of a single vehicle in a given traffic scene. A new trajectory prediction model is developed, which allows information sharing among vehicles using a graph neural network. The prediction is based on the embedding feature, which is derived from multi-dimensional input sequences including the historical trajectory of target and neighboring vehicles, and their relative positions. Compared to other existing trajectory prediction methods, the proposed approach can reduce the pre- diction error by up to 50.00%. Then, we present a deep neural network model that extracts the features from both spatial and temporal domains to predict the UAS traffic density. In addition, a novel input representation of the future sUAS mission information is proposed. The pre-scheduled missions are categorized into 3 types according to their launching times. The results show that our presented model out- performs all of the baseline models. Meanwhile, the qualitative results demonstrate that our model can accurately predict the hot spot in the future traffic map

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N≫2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

    Cooperation in Multi-Agent Reinforcement Learning

    Get PDF
    As progress in reinforcement learning (RL) gives rise to increasingly general and powerful artificial intelligence, society needs to anticipate a possible future in which multiple RL agents must learn and interact in a shared multi-agent environment. When a single principal has oversight of the multi-agent system, how should agents learn to cooperate via centralized training to achieve individual and global objectives? When agents belong to self-interested principals with imperfectly-aligned objectives, how can cooperation emerge from fully-decentralized learning? This dissertation addresses both questions by proposing novel methods for multi-agent reinforcement learning (MARL) and demonstrating the empirical effectiveness of these methods in high-dimensional simulated environments. To address the first case, we propose new algorithms for fully-cooperative MARL in the paradigm of centralized training with decentralized execution. Firstly, we propose a method based on multi-agent curriculum learning and multi-agent credit assignment to address the setting where global optimality is defined as the attainment of all individual goals. Secondly, we propose a hierarchical MARL algorithm to discover and learn interpretable and useful skills for a multi-agent team to optimize a single team objective. Extensive experiments with ablations show the strengths of our approaches over state-of-the-art baselines. To address the second case, we propose learning algorithms to attain cooperation within a population of self-interested RL agents. We propose the design of a new agent who is equipped with the new ability to incentivize other RL agents and explicitly account for the other agents' learning process. This agent overcomes the challenging limitation of fully-decentralized training and generates emergent cooperation in difficult social dilemmas. Then, we extend and apply this technique to the problem of incentive design, where a central incentive designer explicitly optimizes a global objective only by intervening on the rewards of a population of independent RL agents. Experiments on the problem of optimal taxation in a simulated market economy demonstrate the effectiveness of this approach.Ph.D
    • …
    corecore