6,763 research outputs found
Producing efficient error-bounded solutions for transition independent decentralized MDPs
pages 539-546International audienceThere has been substantial progress on algorithms for single-agent sequential decision making problems represented as partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading to either poor solution quality or limited scalability. This paper presents the first approach for solving transition independent decentralized Markov decision processes (MDPs), that inherits these properties. Two related algorithms illustrate this approach. The first recasts the original problem as a finite-horizon deterministic and completely observable Markov decision process. In this form, the original problem is solved by combining heuristic search with constraint optimization to quickly converge into a near-optimal policy. This algorithm also provides the foundation for the first algorithm for solving infinite-horizon transition independent decentralized MDPs. We demonstrate that both methods outperform state-of-the-art algorithms by multiple orders of magnitude, and for infinite-horizon decentralized MDPs, the algorithm is able to construct more concise policies by searching cyclic policy graphs
MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs
We present multi-agent A* (MAA*), the first complete and optimal heuristic
search algorithm for solving decentralized partially-observable Markov decision
problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for
computing optimal plans for a cooperative group of agents that operate in a
stochastic environment such as multirobot coordination, network traffic
control, `or distributed resource allocation. Solving such problems efiectively
is a major challenge in the area of planning under uncertainty. Our solution is
based on a synthesis of classical heuristic search and decentralized control
theory. Experimental results show that MAA* has significant advantages. We
introduce an anytime variant of MAA* and conclude with a discussion of
promising extensions such as an approach to solving infinite horizon problems.Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
Stick-Breaking Policy Learning in Dec-POMDPs
Expectation maximization (EM) has recently been shown to be an efficient
algorithm for learning finite-state controllers (FSCs) in large decentralized
POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often
converge to maxima that are far from optimal. This paper considers a
variable-size FSC to represent the local policy of each agent. These
variable-size FSCs are constructed using a stick-breaking prior, leading to a
new framework called \emph{decentralized stick-breaking policy representation}
(Dec-SBPR). This approach learns the controller parameters with a variational
Bayesian algorithm without having to assume that the Dec-POMDP model is
available. The performance of Dec-SBPR is demonstrated on several benchmark
problems, showing that the algorithm scales to large problems while
outperforming other state-of-the-art methods
Access Policy Design for Cognitive Secondary Users under a Primary Type-I HARQ Process
In this paper, an underlay cognitive radio network that consists of an
arbitrary number of secondary users (SU) is considered, in which the primary
user (PU) employs Type-I Hybrid Automatic Repeat Request (HARQ). Exploiting the
redundancy in PU retransmissions, each SU receiver applies forward interference
cancelation to remove a successfully decoded PU message in the subsequent PU
retransmissions. The knowledge of the PU message state at the SU receivers and
the ACK/NACK message from the PU receiver are sent back to the transmitters.
With this approach and using a Constrained Markov Decision Process (CMDP) model
and Constrained Multi-agent MDP (CMMDP), centralized and decentralized optimum
access policies for SUs are proposed to maximize their average sum throughput
under a PU throughput constraint. In the decentralized case, the channel access
decision of each SU is unknown to the other SU. Numerical results demonstrate
the benefits of the proposed policies in terms of sum throughput of SUs. The
results also reveal that the centralized access policy design outperforms the
decentralized design especially when the PU can tolerate a low average long
term throughput. Finally, the difficulties in decentralized access policy
design with partial state information are discussed
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
- …