83,610 research outputs found
Real scenario and simulations on GLOSA traffic light system for reduced CO2 emissions, waiting time and travel time
Cooperative ITS is enabling vehicles to communicate with the infrastructure
to provide improvements in traffic control. A promising approach consists in
anticipating the road profile and the upcoming dynamic events like traffic
lights. This topic has been addressed in the French public project Co-Drive
through functions developed by Valeo named Green Light Optimal Speed Advisor
(GLOSA). The system advises the optimal speed to pass the next traffic light
without stopping. This paper presents results of its performance in different
scenarios through simulations and real driving measurements. A scaling is done
in an urban area, with different penetration rates in vehicle and
infrastructure equipment for vehicular communication. Our simulation results
indicate that GLOSA can reduce CO2 emissions, waiting time and travel time,
both in experimental conditions and in real traffic conditions.Comment: in 22nd ITS World Congress, Oct 2015, Bordeaux, France. 201
Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning
Developing a safe and efficient collision avoidance policy for multiple
robots is challenging in the decentralized scenarios where each robot generate
its paths without observing other robots' states and intents. While other
distributed multi-robot collision avoidance systems exist, they often require
extracting agent-level features to plan a local collision-free action, which
can be computationally prohibitive and not robust. More importantly, in
practice the performance of these methods are much lower than their centralized
counterparts.
We present a decentralized sensor-level collision avoidance policy for
multi-robot systems, which directly maps raw sensor measurements to an agent's
steering commands in terms of movement velocity. As a first step toward
reducing the performance gap between decentralized and centralized methods, we
present a multi-scenario multi-stage training framework to find an optimal
policy which is trained over a large number of robots on rich, complex
environments simultaneously using a policy gradient based reinforcement
learning algorithm. We validate the learned sensor-level collision avoidance
policy in a variety of simulated scenarios with thorough performance
evaluations and show that the final learned policy is able to find time
efficient, collision-free paths for a large-scale robot system. We also
demonstrate that the learned policy can be well generalized to new scenarios
that do not appear in the entire training period, including navigating a
heterogeneous group of robots and a large-scale scenario with 100 robots.
Videos are available at https://sites.google.com/view/drlmac
Deep neural learning based distributed predictive control for offshore wind farm using high fidelity LES data
The paper explores the deep neural learning (DNL) based predictive control approach for offshore wind farm using high fidelity large eddy simulations (LES) data. The DNL architecture is defined by combining the Long Short-Term Memory (LSTM) units with Convolutional Neural Networks (CNN) for feature extraction and prediction of the offshore wind farm. This hybrid CNN-LSTM model is developed based on the dynamic models of the wind farm and wind turbines as well as higher-fidelity LES data. Then, distributed and decentralized model predictive control (MPC) methods are developed based on the hybrid model for maximizing the wind farm power generation and minimizing the usage of the control commands. Extensive simulations based on a two-turbine and a nine-turbine wind farm cases demonstrate the high prediction accuracy (97% or more) of the trained CNN-LSTM models. They also show that the distributed MPC can achieve up to 38% increase in power generation at farm scale than the decentralized MPC. The computational time of the distributed MPC is around 0.7s at each time step, which is sufficiently fast as a real-time control solution to wind farm operations
Counterfactual Multi-Agent Policy Gradients
Cooperative multi-agent systems can be naturally used to model many real
world problems, such as network packet routing and the coordination of
autonomous vehicles. There is a great need for new reinforcement learning
methods that can efficiently learn decentralised policies for such systems. To
this end, we propose a new multi-agent actor-critic method called
counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised
critic to estimate the Q-function and decentralised actors to optimise the
agents' policies. In addition, to address the challenges of multi-agent credit
assignment, it uses a counterfactual baseline that marginalises out a single
agent's action, while keeping the other agents' actions fixed. COMA also uses a
critic representation that allows the counterfactual baseline to be computed
efficiently in a single forward pass. We evaluate COMA in the testbed of
StarCraft unit micromanagement, using a decentralised variant with significant
partial observability. COMA significantly improves average performance over
other multi-agent actor-critic methods in this setting, and the best performing
agents are competitive with state-of-the-art centralised controllers that get
access to the full state
Decentralization of Multiagent Policies by Learning What to Communicate
Effective communication is required for teams of robots to solve
sophisticated collaborative tasks. In practice it is typical for both the
encoding and semantics of communication to be manually defined by an expert;
this is true regardless of whether the behaviors themselves are bespoke,
optimization based, or learned. We present an agent architecture and training
methodology using neural networks to learn task-oriented communication
semantics based on the example of a communication-unaware expert policy. A
perimeter defense game illustrates the system's ability to handle dynamically
changing numbers of agents and its graceful degradation in performance as
communication constraints are tightened or the expert's observability
assumptions are broken.Comment: 7 page
- …