283 research outputs found
Graph-based Trajectory Prediction with Cooperative Information
For automated driving, predicting the future trajectories of other road users
in complex traffic situations is a hard problem. Modern neural networks use the
past trajectories of traffic participants as well as map data to gather hints
about the possible driver intention and likely maneuvers. With increasing
connectivity between cars and other traffic actors, cooperative information is
another source of data that can be used as inputs for trajectory prediction
algorithms. Connected actors might transmit their intended path or even
complete planned trajectories to other actors, which simplifies the prediction
problem due to the imposed constraints. In this work, we outline the benefits
of using this source of data for trajectory prediction and propose a
graph-based neural network architecture that can leverage this additional data.
We show that the network performance increases substantially if cooperative
data is present. Also, our proposed training scheme improves the network's
performance even for cases where no cooperative information is available. We
also show that the network can deal with inaccurate cooperative data, which
allows it to be used in real automated driving environments.Comment: Accepted for publication at the 26th IEEE International Conference on
Intelligent Transportation Systems 202
FLATLAND: A study of Deep Reinforcement Learning methods applied to the vehicle rescheduling problem in a railway environment
In the field of Reinforcement Learning the task is learning how agents should take sequences of actions in an environment in order to maximize a numerical reward signal. This learning process employed in combination with neural networks has given rise to Deep Reinforcement Learning (DRL), that is nowadays applied in many domains, from video games to robotics and self-driving cars.
This work investigates possible DRL approaches applied to Flatland, a multi-agent railway simulation where the main task is to plan and reschedule train routes in order to optimize the traffic flow within the network. The tasks introduced in Flatland are based on the Vehicle Rescheduling Problem, for which determining an optimal solution is a NP-complete problem in combinatorial optimization and determining acceptably good solutions using heuristics and deterministic methods is not feasible in realistic railway systems.
In particular, we analyze the tasks of navigation of a single agent inside a map, that from a starting position has to reach a target station in the minimum number of time steps and the generalization of this task to a multi-agent setting, with the new issue of conflicts avoidance and resolution between agents.
To solve the problem we developed specific observations of the environment, so as to capture the necessary information for the network, trained with Deep Q-Learning and variants, to learn the best action for each agent, that leads to the solution that maximizes the total reward.
The positive results obtained on small environments offer ideas for various interpretations and possible future developments, showing that Reinforcement Learning has the potential to solve the problem under a new perspective
Societies in the wild: cooperation, norms, and hierarchies
Mención Internacional en el título de doctorThis research has been funded in part by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe” through grant PGC2018-098186-B-I00 (BASIC).Programa de Doctorado en Ingeniería Matemática por la Universidad Carlos III de MadridPresidente: Sandro Meloni.- Secretaria: Francesca Lipari.- Vocal: Giulia Andrighett
The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models
Partially Observable Markov Decision Processes (POMDPs) are used to model
environments where the full state cannot be perceived by an agent. As such the
agent needs to reason taking into account the past observations and actions.
However, simply remembering the full history is generally intractable due to
the exponential growth in the history space. Maintaining a probability
distribution that models the belief over what the true state is can be used as
a sufficient statistic of the history, but its computation requires access to
the model of the environment and is often intractable. While SOTA algorithms
use Recurrent Neural Networks to compress the observation-action history aiming
to learn a sufficient statistic, they lack guarantees of success and can lead
to sub-optimal policies. To overcome this, we propose the Wasserstein Belief
Updater, an RL algorithm that learns a latent model of the POMDP and an
approximation of the belief update. Our approach comes with theoretical
guarantees on the quality of our approximation ensuring that our outputted
beliefs allow for learning the optimal value function
Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees
We study the problem of learning controllers for discrete-time non-linear
stochastic dynamical systems with formal reach-avoid guarantees. This work
presents the first method for providing formal reach-avoid guarantees, which
combine and generalize stability and safety guarantees, with a tolerable
probability threshold over the infinite time horizon. Our method
leverages advances in machine learning literature and it represents formal
certificates as neural networks. In particular, we learn a certificate in the
form of a reach-avoid supermartingale (RASM), a novel notion that we introduce
in this work. Our RASMs provide reachability and avoidance guarantees by
imposing constraints on what can be viewed as a stochastic extension of level
sets of Lyapunov functions for deterministic systems. Our approach solves
several important problems -- it can be used to learn a control policy from
scratch, to verify a reach-avoid specification for a fixed control policy, or
to fine-tune a pre-trained policy if it does not satisfy the reach-avoid
specification. We validate our approach on stochastic non-linear
reinforcement learning tasks.Comment: Accepted at AAAI 202
- …