692 research outputs found
Modeling Human Driving Behavior through Generative Adversarial Imitation Learning
Imitation learning is an approach for generating intelligent behavior when
the cost function is unknown or difficult to specify. Building upon work in
inverse reinforcement learning (IRL), Generative Adversarial Imitation Learning
(GAIL) aims to provide effective imitation even for problems with large or
continuous state and action spaces. Driver modeling is one example of a problem
where the state and action spaces are continuous. Human driving behavior is
characterized by non-linearity and stochasticity, and the underlying cost
function is unknown. As a result, learning from human driving demonstrations is
a promising approach for generating human-like driving behavior. This article
describes the use of GAIL for learning-based driver modeling. Because driver
modeling is inherently a multi-agent problem, where the interaction between
agents needs to be modeled, this paper describes a parameter-sharing extension
of GAIL called PS-GAIL to tackle multi-agent driver modeling. In addition, GAIL
is domain agnostic, making it difficult to encode specific knowledge relevant
to driving in the learning process. This paper describes Reward Augmented
Imitation Learning (RAIL), which modifies the reward signal to provide
domain-specific knowledge to the agent. Finally, human demonstrations are
dependent upon latent factors that may not be captured by GAIL. This paper
describes Burn-InfoGAIL, which allows for disentanglement of latent variability
in demonstrations. Imitation learning experiments are performed using NGSIM, a
real-world highway driving dataset. Experiments show that these modifications
to GAIL can successfully model highway driving behavior, accurately replicating
human demonstrations and generating realistic, emergent behavior in the traffic
flow arising from the interaction between driving agents.Comment: 28 pages, 8 figures. arXiv admin note: text overlap with
arXiv:1803.0104
RITA: Boost Autonomous Driving Simulators with Realistic Interactive Traffic Flow
High-quality traffic flow generation is the core module in building
simulators for autonomous driving. However, the majority of available
simulators are incapable of replicating traffic patterns that accurately
reflect the various features of real-world data while also simulating
human-like reactive responses to the tested autopilot driving strategies.
Taking one step forward to addressing such a problem, we propose Realistic
Interactive TrAffic flow (RITA) as an integrated component of existing driving
simulators to provide high-quality traffic flow for the evaluation and
optimization of the tested driving strategies. RITA is developed with
consideration of three key features, i.e., fidelity, diversity, and
controllability, and consists of two core modules called RITABackend and
RITAKit. RITABackend is built to support vehicle-wise control and provide
traffic generation models from real-world datasets, while RITAKit is developed
with easy-to-use interfaces for controllable traffic generation via
RITABackend. We demonstrate RITA's capacity to create diversified and
high-fidelity traffic simulations in several highly interactive highway
scenarios. The experimental findings demonstrate that our produced RITA traffic
flows exhibit all three key features, hence enhancing the completeness of
driving strategy evaluation. Moreover, we showcase the possibility for further
improvement of baseline strategies through online fine-tuning with RITA traffic
flows.Comment: 8 pages, 5 figures, 3 table
A Survey on Causal Reinforcement Learning
While Reinforcement Learning (RL) achieves tremendous success in sequential
decision-making problems of many domains, it still faces key challenges of data
inefficiency and the lack of interpretability. Interestingly, many researchers
have leveraged insights from the causality literature recently, bringing forth
flourishing works to unify the merits of causality and address well the
challenges from RL. As such, it is of great necessity and significance to
collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL
methods, and investigate the potential functionality from causality toward RL.
In particular, we divide existing CRL approaches into two categories according
to whether their causality-based information is given in advance or not. We
further analyze each category in terms of the formalization of different
models, ranging from the Markov Decision Process (MDP), Partially Observed
Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment
Regime (DTR). Moreover, we summarize the evaluation matrices and open sources
while we discuss emerging applications, along with promising prospects for the
future development of CRL.Comment: 29 pages, 20 figure
ATMS: Algorithmic Trading-Guided Market Simulation
The effective construction of an Algorithmic Trading (AT) strategy often
relies on market simulators, which remains challenging due to existing methods'
inability to adapt to the sequential and dynamic nature of trading activities.
This work fills this gap by proposing a metric to quantify market discrepancy.
This metric measures the difference between a causal effect from underlying
market unique characteristics and it is evaluated through the interaction
between the AT agent and the market. Most importantly, we introduce Algorithmic
Trading-guided Market Simulation (ATMS) by optimizing our proposed metric.
Inspired by SeqGAN, ATMS formulates the simulator as a stochastic policy in
reinforcement learning (RL) to account for the sequential nature of trading.
Moreover, ATMS utilizes the policy gradient update to bypass differentiating
the proposed metric, which involves non-differentiable operations such as order
deletion from the market. Through extensive experiments on semi-real market
data, we demonstrate the effectiveness of our metric and show that ATMS
generates market data with improved similarity to reality compared to the
state-of-the-art conditional Wasserstein Generative Adversarial Network (cWGAN)
approach. Furthermore, ATMS produces market data with more balanced BUY and
SELL volumes, mitigating the bias of the cWGAN baseline approach, where a
simple strategy can exploit the BUY/SELL imbalance for profit
Recommended from our members
Intelligent and High-Performance Behavior Design of Autonomous Systems via Learning, Optimization and Control
Nowadays, great societal demands have rapidly boosted the development of autonomous systems that densely interact with humans in many application domains, from manufacturing to transportation and from workplaces to daily lives. The shift from isolated working environments to human-dominated space requires autonomous systems to be empowered to handle not only environmental uncertainties such as external vibrations but also interaction uncertainties arising from human behavior which is in nature probabilistic, causal but not strictly rational, internally hierarchical and socially compliant.This dissertation is concerned with the design of intelligent and high-performance behavior of such autonomous systems, leveraging the strength from control, optimization, learning, and cognitive science. The work consists of two parts. In Part I, the problem of high-level hybrid human-machine behavior design is addressed. The goal is to achieve safe, efficient and human-like interaction with people. A framework based on the theory of mind, utility theories and imitation learning is proposed to efficiently represent and learn the complicated behavior of humans. Built upon that, machine behaviors at three different levels - the perceptual level, the reasoning level, and the action level - are designed via imitation learning, optimization, and online adaptation, allowing the system to interpret, reason and behave as human, particularly when a variety of uncertainties exist. Applications to autonomous driving are considered throughout Part I. Part II is concerned with the design of high-performance low-level individual machine behavior in the presence of model uncertainties and external disturbances. Advanced control laws based on adaptation, iterative learning and the internal structures of uncertainties/disturbances are developed to assure that the high-level interactive behaviors can be reliably executed. Applications on robot manipulators and high-precision motion systems are discussed in this part
- …