20,614 research outputs found
Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
Today's automated vehicles lack the ability to cooperate implicitly with
others. This work presents a Monte Carlo Tree Search (MCTS) based approach for
decentralized cooperative planning using macro-actions for automated vehicles
in heterogeneous environments. Based on cooperative modeling of other agents
and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the
state-action-values of each agent in a cooperative and decentralized manner,
explicitly modeling the interdependence of actions between traffic
participants. Macro-actions allow for temporal extension over multiple time
steps and increase the effective search depth requiring fewer iterations to
plan over longer horizons. Without predefined policies for macro-actions, the
algorithm simultaneously learns policies over and within macro-actions. The
proposed method is evaluated under several conflict scenarios, showing that the
algorithm can achieve effective cooperative planning with learned macro-actions
in heterogeneous environments
A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks
Reinforcement learning has been applied to many interesting problems such as
the famous TD-gammon and the inverted helicopter flight. However, little effort
has been put into developing methods to learn policies for complex persistent
tasks and tasks that are time-sensitive. In this paper, we take a step towards
solving this problem by using signal temporal logic (STL) as task
specification, and taking advantage of the temporal abstraction feature that
the options framework provide. We show via simulation that a relatively easy to
implement algorithm that combines STL and options can learn a satisfactory
policy with a small number of training case
A hierarchical reinforcement learning method for persistent time-sensitive tasks
Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper, we take a step towards solving this problem by using signal temporal logic (STL) as task specification, and taking advantage of the temporal abstraction feature that the options framework provide. We show via simulation that a relatively easy to implement algorithm that combines STL and options can learn a satisfactory policy with a small number of training cases
- …