25,840 research outputs found
Verification for Machine Learning, Autonomy, and Neural Networks Survey
This survey presents an overview of verification techniques for autonomous
systems, with a focus on safety-critical autonomous cyber-physical systems
(CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances
in artificial intelligence (AI) and machine learning (ML) through approaches
such as deep neural networks (DNNs), embedded in so-called learning enabled
components (LECs) that accomplish tasks from classification to control.
Recently, the formal methods and formal verification community has developed
methods to characterize behaviors in these LECs with eventual goals of formally
verifying specifications for LECs, and this article presents a survey of many
of these recent approaches
Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning
Asynchronous stochastic approximations (SAs) are an important class of
model-free algorithms, tools and techniques that are popular in multi-agent and
distributed control scenarios. To counter Bellman's curse of dimensionality,
such algorithms are coupled with function approximations. Although the
learning/ control problem becomes more tractable, function approximations
affect stability and convergence. In this paper, we present verifiable
sufficient conditions for stability and convergence of asynchronous SAs with
biased approximation errors. The theory developed herein is used to analyze
Policy Gradient methods and noisy Value Iteration schemes. Specifically, we
analyze the asynchronous approximate counterparts of the policy gradient (A2PG)
and value iteration (A2VI) schemes. It is shown that the stability of these
algorithms is unaffected by biased approximation errors, provided they are
asymptotically bounded. With respect to convergence (of A2VI and A2PG), a
relationship between the limiting set and the approximation errors is
established. Finally, experimental results are presented that support the
theory
Learning Model Predictive Control for Competitive Autonomous Racing
The goal of this thesis is to design a learning model predictive controller
(LMPC) that allows multiple agents to race competitively on a predefined race
track in real-time. This thesis addresses two major shortcomings in the already
existing single-agent formulation. Previously, the agent determines a locally
optimal trajectory but does not explore the state space, which may be necessary
for overtaking maneuvers. Additionally, obstacle avoidance for LMPC has been
achieved in the past by using a non-convex terminal set, which increases the
complexity for determining a solution to the optimization problem. The proposed
algorithm for multi-agent racing explores the state space by executing the LMPC
for multiple different initializations, which yields a richer terminal safe
set. Furthermore, a new method for selecting states in the terminal set is
developed, which keeps the convexity for the terminal safe set and allows for
taking suboptimal states
Energy-Based Continuous Inverse Optimal Control
The problem of continuous optimal control (over finite time horizon) is to
minimize a given cost function over the sequence of continuous control
variables. The problem of continuous inverse optimal control is to learn the
unknown cost function from expert demonstrations. In this article, we study
this fundamental problem in the framework of energy-based model, where the
observed expert trajectories are assumed to be random samples from a
probability density function defined as the exponential of the negative cost
function up to a normalizing constant. The parameters of the cost function are
learned by maximum likelihood via an "analysis by synthesis" scheme, which
iterates the following two steps: (1) Synthesis step: sample the synthesized
trajectories from the current probability density using the Langevin dynamics
via back-propagation through time. (2) Analysis step: update the model
parameters based on the statistical difference between the synthesized
trajectories and the observed trajectories. Given the fact that an efficient
optimization algorithm is usually available for an optimal control problem, we
also consider a convenient approximation of the above learning method, where we
replace the sampling in the synthesis step by optimization. To make the
sampling or optimization more efficient, we propose to train the energy-based
model simultaneously with a top-down trajectory generator via cooperative
learning, where the trajectory generator is used to fast initialize the
sampling step or optimization step of the energy-based model. We demonstrate
the proposed methods on autonomous driving tasks, and show that it can learn
suitable cost functions for optimal control
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
A variety of cooperative multi-agent control problems require agents to
achieve individual goals while contributing to collective success. This
multi-goal multi-agent setting poses difficulties for recent algorithms, which
primarily target settings with a single global reward, due to two new
challenges: efficient exploration for learning both individual goal attainment
and cooperation for others' success, and credit-assignment for interactions
between actions and goals of different agents. To address both challenges, we
restructure the problem into a novel two-stage curriculum, in which
single-agent goal attainment is learned prior to learning multi-agent
cooperation, and we derive a new multi-goal multi-agent policy gradient with a
credit function for localized credit assignment. We use a function augmentation
scheme to bridge value and policy functions across the curriculum. The complete
architecture, called CM3, learns significantly faster than direct adaptations
of existing algorithms on three challenging multi-goal multi-agent problems:
cooperative navigation in difficult formations, negotiating multi-vehicle lane
changes in the SUMO traffic simulator, and strategic cooperation in a Checkers
environment.Comment: Published at International Conference on Learning Representations
202
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation
Collecting and maintaining accurate world knowledge in a dynamic, complex,
adversarial, and stochastic environment such as the RoboCup 3D Soccer
Simulation is a challenging task. Knowledge should be learned in real-time with
time constraints. We use recently introduced Off-Policy Gradient Descent
algorithms within Reinforcement Learning that illustrate learnable knowledge
representations for dynamic role assignments. The results show that the agents
have learned competitive policies against the top teams from the RoboCup 2012
competitions for three vs three, five vs five, and seven vs seven agents. We
have explicitly used subsets of agents to identify the dynamics and the
semantics for which the agents learn to maximize their performance measures,
and to gather knowledge about different objectives, so that all agents
participate effectively and efficiently within the group.Comment: 18 pages, 8 figure
Learning Curriculum Policies for Reinforcement Learning
Curriculum learning in reinforcement learning is a training methodology that
seeks to speed up learning of a difficult target task, by first training on a
series of simpler tasks and transferring the knowledge acquired to the target
task. Automatically choosing a sequence of such tasks (i.e. a curriculum) is an
open problem that has been the subject of much recent work in this area. In
this paper, we build upon a recent method for curriculum design, which
formulates the curriculum sequencing problem as a Markov Decision Process. We
extend this model to handle multiple transfer learning algorithms, and show for
the first time that a curriculum policy over this MDP can be learned from
experience. We explore various representations that make this possible, and
evaluate our approach by learning curriculum policies for multiple agents in
two different domains. The results show that our method produces curricula that
can train agents to perform on a target task as fast or faster than existing
methods
Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving
Designing reliable decision strategies for autonomous urban driving is
challenging. Reinforcement learning (RL) has been used to automatically derive
suitable behavior in uncertain environments, but it does not provide any
guarantee on the performance of the resulting policy. We propose a generic
approach to enforce probabilistic guarantees on an RL agent. An exploration
strategy is derived prior to training that constrains the agent to choose among
actions that satisfy a desired probabilistic specification expressed with
linear temporal logic (LTL). Reducing the search space to policies satisfying
the LTL formula helps training and simplifies reward design. This paper
outlines a case study of an intersection scenario involving multiple traffic
participants. The resulting policy outperforms a rule-based heuristic approach
in terms of efficiency while exhibiting strong guarantees on safety
Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments
Navigating urban environments represents a complex task for automated
vehicles. They must reach their goal safely and efficiently while considering a
multitude of traffic participants. We propose a modular decision making
algorithm to autonomously navigate intersections, addressing challenges of
existing rule-based and reinforcement learning (RL) approaches. We first
present a safe RL algorithm relying on a model-checker to ensure safety
guarantees. To make the decision strategy robust to perception errors and
occlusions, we introduce a belief update technique using a learning based
approach. Finally, we use a scene decomposition approach to scale our algorithm
to environments with multiple traffic participants. We empirically demonstrate
that our algorithm outperforms rule-based methods and reinforcement learning
techniques on a complex intersection scenario.Comment: 8 pages; 7 figure
- …