8,332 research outputs found
Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions
The focus of this paper is on solving multi-robot planning problems in
continuous spaces with partial observability. Decentralized partially
observable Markov decision processes (Dec-POMDPs) are general models for
multi-robot coordination problems, but representing and solving Dec-POMDPs is
often intractable for large problems. To allow for a high-level representation
that is natural for multi-robot problems and scalable to large discrete and
continuous problems, this paper extends the Dec-POMDP model to the
decentralized partially observable semi-Markov decision process (Dec-POSMDP).
The Dec-POSMDP formulation allows asynchronous decision-making by the robots,
which is crucial in multi-robot domains. We also present an algorithm for
solving this Dec-POSMDP which is much more scalable than previous methods since
it can incorporate closed-loop belief space macro-actions in planning. These
macro-actions are automatically constructed to produce robust solutions. The
proposed method's performance is evaluated on a complex multi-robot package
delivery problem under uncertainty, showing that our approach can naturally
represent multi-robot problems and provide high-quality solutions for
large-scale problems
Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
Today's automated vehicles lack the ability to cooperate implicitly with
others. This work presents a Monte Carlo Tree Search (MCTS) based approach for
decentralized cooperative planning using macro-actions for automated vehicles
in heterogeneous environments. Based on cooperative modeling of other agents
and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the
state-action-values of each agent in a cooperative and decentralized manner,
explicitly modeling the interdependence of actions between traffic
participants. Macro-actions allow for temporal extension over multiple time
steps and increase the effective search depth requiring fewer iterations to
plan over longer horizons. Without predefined policies for macro-actions, the
algorithm simultaneously learns policies over and within macro-actions. The
proposed method is evaluated under several conflict scenarios, showing that the
algorithm can achieve effective cooperative planning with learned macro-actions
in heterogeneous environments
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Many problems in sequential decision making and stochastic control often have
natural multiscale structure: sub-tasks are assembled together to accomplish
complex goals. Systematically inferring and leveraging hierarchical structure,
particularly beyond a single level of abstraction, has remained a longstanding
challenge. We describe a fast multiscale procedure for repeatedly compressing,
or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of
sub-problems at different scales is automatically determined. Coarsened MDPs
are themselves independent, deterministic MDPs, and may be solved using
existing algorithms. The multiscale representation delivered by this procedure
decouples sub-tasks from each other and can lead to substantial improvements in
convergence rates both locally within sub-problems and globally across
sub-problems, yielding significant computational savings. A second fundamental
aspect of this work is that these multiscale decompositions yield new transfer
opportunities across different problems, where solutions of sub-tasks at
different levels of the hierarchy may be amenable to transfer to new problems.
Localized transfer of policies and potential operators at arbitrary scales is
emphasized. Finally, we demonstrate compression and transfer in a collection of
illustrative domains, including examples involving discrete and continuous
statespaces.Comment: 86 pages, 15 figure
Monte-Carlo methods for NLTE spectral synthesis of supernovae
We present JEKYLL, a new code for modelling of supernova (SN) spectra and
lightcurves based on Monte-Carlo (MC) techniques for the radiative transfer.
The code assumes spherical symmetry, homologous expansion and steady state for
the matter, but is otherwise capable of solving the time-dependent radiative
transfer problem in non-local-thermodynamic-equilibrium (NLTE). The method used
was introduced in a series of papers by Lucy, but the full time-dependent NLTE
capabilities of it have never been tested. Here, we have extended the method to
include non-thermal excitation and ionization as well as charge-transfer and
two-photon processes. Based on earlier work, the non-thermal rates are
calculated by solving the Spencer-Fano equation. Using a method previously
developed for the SUMO code, macroscopic mixing of the material is taken into
account in a statistical sense. In addition, a statistical Markov-chain model
is used to sample the emission frequency, and we introduce a method to control
the sampling of the radiation field. Except for a description of JEKYLL, we
provide comparisons with the ARTIS, SUMO and CMFGEN codes, which show good
agreement in the calculated spectra as well as the state of the gas. In
particular, the comparison with CMFGEN, which is similar in terms of physics
but uses a different technique, shows that the Lucy method does indeed converge
in the time-dependent NLTE case. Finally, as an example of the time-dependent
NLTE capabilities of JEKYLL, we present a model of a Type IIb SN, taken from a
set of models presented and discussed in detail in an accompanying paper. Based
on this model we investigate the effects of NLTE, in particular those arising
from non-thermal excitation and ionization, and find strong effects even on the
bolometric lightcurve. This highlights the need for full NLTE calculations when
simulating the spectra and lightcurves of SNe.Comment: Accepted for publication by Astronomy & Astrophysic
Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions
This paper presents a data-driven approach for multi-robot coordination in
partially-observable domains based on Decentralized Partially Observable Markov
Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a
general framework for cooperative sequential decision making under uncertainty
and MAs allow temporally extended and asynchronous action execution. To date,
most methods assume the underlying Dec-POMDP model is known a priori or a full
simulator is available during planning time. Previous methods which aim to
address these issues suffer from local optimality and sensitivity to initial
conditions. Additionally, few hardware demonstrations involving a large team of
heterogeneous robots and with long planning horizons exist. This work addresses
these gaps by proposing an iterative sampling based Expectation-Maximization
algorithm (iSEM) to learn polices using only trajectory data containing
observations, MAs, and rewards. Our experiments show the algorithm is able to
achieve better solution quality than the state-of-the-art learning-based
methods. We implement two variants of multi-robot Search and Rescue (SAR)
domains (with and without obstacles) on hardware to demonstrate the learned
policies can effectively control a team of distributed robots to cooperate in a
partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2017
A Policy Switching Approach to Consolidating Load Shedding and Islanding Protection Schemes
In recent years there have been many improvements in the reliability of
critical infrastructure systems. Despite these improvements, the power systems
industry has seen relatively small advances in this regard. For instance, power
quality deficiencies, a high number of localized contingencies, and large
cascading outages are still too widespread. Though progress has been made in
improving generation, transmission, and distribution infrastructure, remedial
action schemes (RAS) remain non-standardized and are often not uniformly
implemented across different utilities, ISOs, and RTOs. Traditionally, load
shedding and islanding have been successful protection measures in restraining
propagation of contingencies and large cascading outages. This paper proposes a
novel, algorithmic approach to selecting RAS policies to optimize the operation
of the power network during and after a contingency. Specifically, we use
policy-switching to consolidate traditional load shedding and islanding
schemes. In order to model and simulate the functionality of the proposed power
systems protection algorithm, we conduct Monte-Carlo, time-domain simulations
using Siemens PSS/E. The algorithm is tested via experiments on the IEEE-39
topology to demonstrate that the proposed approach achieves optimal power
system performance during emergency situations, given a specific set of RAS
policies.Comment: Full Paper Accepted to PSCC 2014 - IEEE Co-Sponsored Conference. 7
Pages, 2 Figures, 2 Table
- …