427 research outputs found

    Using mathematical programming to solve Factored Markov Decision Processes with Imprecise Probabilities

    Get PDF
    AbstractThis paper investigates Factored Markov Decision Processes with Imprecise Probabilities (MDPIPs); that is, Factored Markov Decision Processes (MDPs) where transition probabilities are imprecisely specified. We derive efficient approximate solutions for Factored MDPIPs based on mathematical programming. To do this, we extend previous linear programming approaches for linear approximations in Factored MDPs, resulting in a multilinear formulation for robust “maximin” linear approximations in Factored MDPIPs. By exploiting the factored structure in MDPIPs we are able to demonstrate orders of magnitude reduction in solution time over standard exact non-factored approaches, in exchange for relatively low approximation errors, on a difficult class of benchmark problems with millions of states

    Computational Approaches for Stochastic Shortest Path on Succinct MDPs

    Full text link
    We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a)~for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b)~for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature

    Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

    Full text link
    Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature. Commonly, analysis of models with uncertainties amounts to searching for the most robust policy which means that the goal is to generate a policy with the greatest lower bound on performance (or, symmetrically, the lowest upper bound on costs). However, hedging against an unlikely worst case may lead to losses in other situations. In general, one is interested in policies that behave well in all situations which results in a multi-objective view on decision making. In this paper, we consider policies for the expected discounted reward measure of MDPs with uncertain parameters. In particular, the approach is defined for bounded-parameter MDPs (BMDPs) [8]. In this setting the worst, best and average case performances of a policy are analyzed simultaneously, which yields a multi-scenario multi-objective optimization problem. The paper presents and evaluates approaches to compute the pure Pareto optimal policies in the value vector space.Comment: 9 pages, 5 figures, preprint for VALUETOOLS 201

    Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches

    Get PDF
    Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment

    ISIPTA'07: Proceedings of the Fifth International Symposium on Imprecise Probability: Theories and Applications

    Get PDF
    B

    Policy Explanation and Model Refinement in Decision-Theoretic Planning

    Get PDF
    Decision-theoretic systems, such as Markov Decision Processes (MDPs), are used for sequential decision-making under uncertainty. MDPs provide a generic framework that can be applied in various domains to compute optimal policies. This thesis presents techniques that offer explanations of optimal policies for MDPs and then refine decision theoretic models (Bayesian networks and MDPs) based on feedback from experts. Explaining policies for sequential decision-making problems is difficult due to the presence of stochastic effects, multiple possibly competing objectives and long-range effects of actions. However, explanations are needed to assist experts in validating that the policy is correct and to help users in developing trust in the choices recommended by the policy. A set of domain-independent templates to justify a policy recommendation is presented along with a process to identify the minimum possible number of templates that need to be populated to completely justify the policy. The rejection of an explanation by a domain expert indicates a deficiency in the model which led to the generation of the rejected policy. Techniques to refine the model parameters such that the optimal policy calculated using the refined parameters would conform with the expert feedback are presented in this thesis. The expert feedback is translated into constraints on the model parameters that are used during refinement. These constraints are non-convex for both Bayesian networks and MDPs. For Bayesian networks, the refinement approach is based on Gibbs sampling and stochastic hill climbing, and it learns a model that obeys expert constraints. For MDPs, the parameter space is partitioned such that alternating linear optimization can be applied to learn model parameters that lead to a policy in accordance with expert feedback. In practice, the state space of MDPs can often be very large, which can be an issue for real-world problems. Factored MDPs are often used to deal with this issue. In Factored MDPs, state variables represent the state space and dynamic Bayesian networks model the transition functions. This helps to avoid the exponential growth in the state space associated with large and complex problems. The approaches for explanation and refinement presented in this thesis are also extended for the factored case to demonstrate their use in real-world applications. The domains of course advising to undergraduate students, assisted hand-washing for people with dementia and diagnostics for manufacturing are used to present empirical evaluations

    A general framework integrating techniques for scheduling under uncertainty

    Get PDF
    Ces dernières années, de nombreux travaux de recherche ont porté sur la planification de tâches et l'ordonnancement sous incertitudes. Ce domaine de recherche comprend un large choix de modèles, techniques de résolution et systèmes, et il est difficile de les comparer car les terminologies existantes sont incomplètes. Nous avons cependant identifié des familles d'approches générales qui peuvent être utilisées pour structurer la littérature suivant trois axes perpendiculaires. Cette nouvelle structuration de l'état de l'art est basée sur la façon dont les décisions sont prises. De plus, nous proposons un modèle de génération et d'exécution pour ordonnancer sous incertitudes qui met en oeuvre ces trois familles d'approches. Ce modèle est un automate qui se développe lorsque l'ordonnancement courant n'est plus exécutable ou lorsque des conditions particulières sont vérifiées. Le troisième volet de cette thèse concerne l'étude expérimentale que nous avons menée. Au-dessus de ILOG Solver et Scheduler nous avons implémenté un prototype logiciel en C++, directement instancié de notre modèle de génération et d'exécution. Nous présentons de nouveaux problèmes d'ordonnancement probabilistes et une approche par satisfaction de contraintes combinée avec de la simulation pour les résoudre. ABSTRACT : For last years, a number of research investigations on task planning and scheduling under uncertainty have been conducted. This research domain comprises a large number of models, resolution techniques, and systems, and it is difficult to compare them since the existing terminologies are incomplete. However, we identified general families of approaches that can be used to structure the literature given three perpendicular axes. This new classification of the state of the art is based on the way decisions are taken. In addition, we propose a generation and execution model for scheduling under uncertainty that combines these three families of approaches. This model is an automaton that develops when the current schedule is no longer executable or when some particular conditions are met. The third part of this thesis concerns our experimental study. On top of ILOG Solver and Scheduler, we implemented a software prototype in C++ directly instantiated from our generation and execution model. We present new probabilistic scheduling problems and a constraintbased approach combined with simulation to solve some instances thereof
    corecore