443 research outputs found
Modeling Human Driving Behavior through Generative Adversarial Imitation Learning
Imitation learning is an approach for generating intelligent behavior when
the cost function is unknown or difficult to specify. Building upon work in
inverse reinforcement learning (IRL), Generative Adversarial Imitation Learning
(GAIL) aims to provide effective imitation even for problems with large or
continuous state and action spaces. Driver modeling is one example of a problem
where the state and action spaces are continuous. Human driving behavior is
characterized by non-linearity and stochasticity, and the underlying cost
function is unknown. As a result, learning from human driving demonstrations is
a promising approach for generating human-like driving behavior. This article
describes the use of GAIL for learning-based driver modeling. Because driver
modeling is inherently a multi-agent problem, where the interaction between
agents needs to be modeled, this paper describes a parameter-sharing extension
of GAIL called PS-GAIL to tackle multi-agent driver modeling. In addition, GAIL
is domain agnostic, making it difficult to encode specific knowledge relevant
to driving in the learning process. This paper describes Reward Augmented
Imitation Learning (RAIL), which modifies the reward signal to provide
domain-specific knowledge to the agent. Finally, human demonstrations are
dependent upon latent factors that may not be captured by GAIL. This paper
describes Burn-InfoGAIL, which allows for disentanglement of latent variability
in demonstrations. Imitation learning experiments are performed using NGSIM, a
real-world highway driving dataset. Experiments show that these modifications
to GAIL can successfully model highway driving behavior, accurately replicating
human demonstrations and generating realistic, emergent behavior in the traffic
flow arising from the interaction between driving agents.Comment: 28 pages, 8 figures. arXiv admin note: text overlap with
arXiv:1803.0104
A Probabilistic Framework for Imitating Human Race Driver Behavior
Understanding and modeling human driver behavior is crucial for advanced
vehicle development. However, unique driving styles, inconsistent behavior, and
complex decision processes render it a challenging task, and existing
approaches often lack variability or robustness. To approach this problem, we
propose Probabilistic Modeling of Driver behavior (ProMoD), a modular framework
which splits the task of driver behavior modeling into multiple modules. A
global target trajectory distribution is learned with Probabilistic Movement
Primitives, clothoids are utilized for local path generation, and the
corresponding choice of actions is performed by a neural network. Experiments
in a simulated car racing setting show considerable advantages in imitation
accuracy and robustness compared to other imitation learning algorithms. The
modular architecture of the proposed framework facilitates straightforward
extensibility in driving line adaptation and sequencing of multiple movement
primitives for future research.Comment: updated references [17] and [33]; added journal inf
Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics
Skill-based reinforcement learning (RL) has emerged as a promising strategy
to leverage prior knowledge for accelerated robot learning. Skills are
typically extracted from expert demonstrations and are embedded into a latent
space from which they can be sampled as actions by a high-level RL agent.
However, this skill space is expansive, and not all skills are relevant for a
given robot state, making exploration difficult. Furthermore, the downstream RL
agent is limited to learning structurally similar tasks to those used to
construct the skill space. We firstly propose accelerating exploration in the
skill space using state-conditioned generative models to directly bias the
high-level agent towards only sampling skills relevant to a given state based
on prior experience. Next, we propose a low-level residual policy for
fine-grained skill adaptation enabling downstream RL agents to adapt to unseen
task variations. Finally, we validate our approach across four challenging
manipulation tasks that differ from those used to build the skill space,
demonstrating our ability to learn across task variations while significantly
accelerating exploration, outperforming prior works. Code and videos are
available on our project website: https://krishanrana.github.io/reskill.Comment: 6th Conference on Robot Learning (CoRL), 202
Exploiting Structure for Scalable and Robust Deep Learning
Deep learning has seen great success training deep neural networks for complex prediction problems, such as large-scale image recognition, short-term time-series forecasting, and learning behavioral models for games with simple dynamics. However, neural networks have a number of weaknesses: 1) they are not sample-efficient and 2) they are often not robust against (adversarial) input perturbations. Hence, it is challenging to train neural networks for problems with exponential complexity, such as multi-agent games, complex long-term spatiotemporal dynamics, or noisy high-resolution image data.
This thesis contributes methods to improve the sample efficiency, expressive power, and robustness of neural networks, by exploiting various forms of low-dimensional structure, such as spatiotemporal hierarchy and multi-agent coordination. We show the effectiveness of this approach in multiple learning paradigms: in both the supervised learning (e.g., imitation learning) and reinforcement learning settings.
First, we introduce hierarchical neural networks that model both short-term actions and long-term goals from data, and can learn human-level behavioral models for spatiotemporal multi-agent games, such as basketball, using imitation learning.
Second, in reinforcement learning, we show that behavioral policies with a hierarchical latent structure can efficiently learn forms of multi-agent coordination, which enables a form of structured exploration for faster learning.
Third, we showcase tensor-train recurrent neural networks that can model high-order mutliplicative structure in dynamical systems (e.g., Lorenz dynamics). We show that this model class gives state-of-the-art long-term forecasting performance with very long time horizons for both simulation and real-world traffic and climate data.
Finally, we demonstrate two methods for neural network robustness: 1) stability training, a form of stochastic data augmentation to make neural networks more robust, and 2) neural fingerprinting, a method that detects adversarial examples by validating the networkβs behavior in the neighborhood of any given input.
In sum, this thesis takes a step to enable machine learning for the next scale of problem complexity, such as rich spatiotemporal multi-agent games and large-scale robust predictions.</p
Engage D1.2 Final Project Results Report
This deliverable summarises the activities and results of Engage, the SESAR 2020 Knowledge Transfer Network (KTN). The KTN initiated and supported multiple activities for SESAR and the European air traffic management (ATM) community, including PhDs, focused catalyst fund projects, thematic workshops, summer schools and the launch of a wiki as the one-stop, go-to source for ATM research and knowledge in Europe. Key throughout was the integration of exploratory and industrial research, thus expediting the innovation pipeline and bringing researchers together. These activities laid valuable foundations for the SESAR Digital Academy
βEconomic manβ in cross-cultural perspective: Behavioral experiments in 15 small-scale societies
Researchers from across the social sciences have found consistent deviations from the predictions of the canonical model of self-interest in hundreds of experiments from around the world. This research, however, cannot determine whether the uniformity results from universal patterns of human behavior or from the limited cultural variation available among the university students used in virtually all prior experimental work. To address this, we undertook a cross-cultural study of behavior in ultimatum, public goods, and dictator games in a range of small-scale societies exhibiting a wide variety of economic and cultural conditions. We found, first, that the canonical model β based on self-interest β fails in all of the societies studied. Second, our data reveal substantially more behavioral variability across social groups than has been found in previous research. Third, group-level differences in economic organization and the structure of social interactions explain a substantial portion of the behavioral variation across societies: the higher the degree of market integration and the higher the payoffs to cooperation in everyday life, the greater the level of prosociality expressed in experimental games. Fourth, the available individual-level economic and demographic variables do not consistently explain game behavior, either within or across groups. Fifth, in many cases experimental play appears to reflect the common interactional patterns of everyday life
- β¦