2,522 research outputs found
Automated Experiment Design for Data-Efficient Verification of Parametric Markov Decision Processes
We present a new method for statistical verification of quantitative
properties over a partially unknown system with actions, utilising a
parameterised model (in this work, a parametric Markov decision process) and
data collected from experiments performed on the underlying system. We obtain
the confidence that the underlying system satisfies a given property, and show
that the method uses data efficiently and thus is robust to the amount of data
available. These characteristics are achieved by firstly exploiting parameter
synthesis to establish a feasible set of parameters for which the underlying
system will satisfy the property; secondly, by actively synthesising
experiments to increase amount of information in the collected data that is
relevant to the property; and finally propagating this information over the
model parameters, obtaining a confidence that reflects our belief whether or
not the system parameters lie in the feasible set, thereby solving the
verification problem.Comment: QEST 2017, 18 pages, 7 figure
Decision-theoretic planning with non-Markovian rewards
A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decision-theoretic planning, where many desirable behaviours are more naturally expressed a
Optimisation of stochastic networks with blocking: a functional-form approach
This paper introduces a class of stochastic networks with blocking, motivated
by applications arising in cellular network planning, mobile cloud computing,
and spare parts supply chains. Blocking results in lost revenue due to
customers or jobs being permanently removed from the system. We are interested
in striking a balance between mitigating blocking by increasing service
capacity, and maintaining low costs for service capacity. This problem is
further complicated by the stochastic nature of the system. Owing to the
complexity of the system there are no analytical results available that
formulate and solve the relevant optimization problem in closed form.
Traditional simulation-based methods may work well for small instances, but the
associated computational costs are prohibitive for networks of realistic size.
We propose a hybrid functional-form based approach for finding the optimal
resource allocation, combining the speed of an analytical approach with the
accuracy of simulation-based optimisation. The key insight is to replace the
computationally expensive gradient estimation in simulation optimisation with a
closed-form analytical approximation that is calibrated using a single
simulation run. We develop two implementations of this approach and conduct
extensive computational experiments on complex examples to show that it is
capable of substantially improving system performance. We also provide evidence
that our approach has substantially lower computational costs compared to
stochastic approximation
Decision-Theoretic Planning with non-Markovian Rewards
A decision process in which rewards depend on history rather than merely on
the current state is called a decision process with non-Markovian rewards
(NMRDP). In decision-theoretic planning, where many desirable behaviours are
more naturally expressed as properties of execution sequences rather than as
properties of states, NMRDPs form a more natural model than the commonly
adopted fully Markovian decision process (MDP) model. While the more tractable
solution methods developed for MDPs do not directly apply in the presence of
non-Markovian rewards, a number of solution methods for NMRDPs have been
proposed in the literature. These all exploit a compact specification of the
non-Markovian reward function in temporal logic, to automatically translate the
NMRDP into an equivalent MDP which is solved using efficient MDP solution
methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process
Planner), a software platform for the development and experimentation of
methods for decision-theoretic planning with non-Markovian rewards. The current
version of NMRDPP implements, under a single interface, a family of methods
based on existing as well as new approaches which we describe in detail. These
include dynamic programming, heuristic search, and structured methods. Using
NMRDPP, we compare the methods and identify certain problem features that
affect their performance. NMRDPPs treatment of non-Markovian rewards is
inspired by the treatment of domain-specific search control knowledge in the
TLPlan planner, which it incorporates as a special case. In the First
International Probabilistic Planning Competition, NMRDPP was able to compete
and perform well in both the domain-independent and hand-coded tracks, using
search control knowledge in the latter
- …