1,038 research outputs found
Anytime Point-Based Approximations for Large POMDPs
The Partially Observable Markov Decision Process has long been recognized as
a rich framework for real-world planning and control problems, especially in
robotics. However exact solutions in this framework are typically
computationally intractable for all but the smallest problems. A well-known
technique for speeding up POMDP solving involves performing value backups at
specific belief points, rather than over the entire belief simplex. The
efficiency of this approach, however, depends greatly on the selection of
points. This paper presents a set of novel techniques for selecting informative
belief points which work well in practice. The point selection procedure is
combined with point-based value backups to form an effective anytime POMDP
algorithm called Point-Based Value Iteration (PBVI). The first aim of this
paper is to introduce this algorithm and present a theoretical analysis
justifying the choice of belief selection technique. The second aim of this
paper is to provide a thorough empirical comparison between PBVI and other
state-of-the-art POMDP methods, in particular the Perseus algorithm, in an
effort to highlight their similarities and differences. Evaluation is performed
using both standard POMDP domains and realistic robotic tasks
Optimal control and optimal sensor activation for Markov decision problems with costly observations
This paper considers partial observation Markov decision processes. Besides the classical control decisions influencing the transition probabilities of the Markov process, we also consider control actions that can activate the sensors to provide more or less accurate information about the system state, explicitly including the cost of activating sensors. We synthesize control laws that minimize a discounted operating cost of the system over an infinite interval of time, where the instantaneous cost function depends on the current state, the control influencing the transition probabilities, and the control actions activating the sensors. A general computationally efficient optimal solution for this problem is not known. Hence we design supoptimal controllers that only use knowledge of the value function for the full state information Markov decision problem. Our solution guarantees that the discounted cost of operating the plant increases only by a bounded amount with respect to the minimal cost for the full state information problem. A new concept of pinned conditional distributions of the state given the observed history of the plant is required in order to implement these control laws online
Nonapproximability Results for Partially Observable Markov Decision Processes
We show that for several variations of partially observable Markov decision
processes, polynomial-time algorithms for finding control policies are unlikely
to or simply don't have guarantees of finding policies within a constant factor
or a constant summand of optimal. Here "unlikely" means "unless some complexity
classes collapse," where the collapses considered are P=NP, P=PSPACE, or P=EXP.
Until or unless these collapses are shown to hold, any control-policy designer
must choose between such performance guarantees and efficient computation
Reinforcement learning for efficient network penetration testing
Penetration testing (also known as pentesting or PT) is a common practice for actively assessing the defenses of a computer network by planning and executing all possible attacks to discover and exploit existing vulnerabilities. Current penetration testing methods are increasingly becoming non-standard, composite and resource-consuming despite the use of evolving tools. In this paper, we propose and evaluate an AI-based pentesting system which makes use of machine learning techniques, namely reinforcement learning (RL) to learn and reproduce average and complex pentesting activities. The proposed system is named Intelligent Automated Penetration Testing System (IAPTS) consisting of a module that integrates with industrial PT frameworks to enable them to capture information, learn from experience, and reproduce tests in future similar testing cases. IAPTS aims to save human resources while producing much-enhanced results in terms of time consumption, reliability and frequency of testing. IAPTS takes the approach of modeling PT environments and tasks as a partially observed Markov decision process (POMDP) problem which is solved by POMDP-solver. Although the scope of this paper is limited to network infrastructures PT planning and not the entire practice, the obtained results support the hypothesis that RL can enhance PT beyond the capabilities of any human PT expert in terms of time consumed, covered attacking vectors, accuracy and reliability of the outputs. In addition, this work tackles the complex problem of expertise capturing and re-use by allowing the IAPTS learning module to store and re-use PT policies in the same way that a human PT expert would learn but in a more efficient way
Value-Function Approximations for Partially Observable Markov Decision Processes
Partially observable Markov decision processes (POMDPs) provide an elegant
mathematical framework for modeling complex decision and planning problems in
stochastic domains in which states of the system are observable only
indirectly, via a set of imperfect or noisy observations. The modeling
advantage of POMDPs, however, comes at a price -- exact methods for solving
them are computationally very expensive and thus applicable in practice only to
very simple problems. We focus on efficient approximation (heuristic) methods
that attempt to alleviate the computational problem and trade off accuracy for
speed. We have two objectives here. First, we survey various approximation
methods, analyze their properties and relations and provide some new insights
into their differences. Second, we present a number of new approximation
methods and novel refinements of existing techniques. The theoretical results
are supported by experiments on a problem from the agent navigation domain
- …