40 research outputs found
Techniques for the allocation of resources under uncertainty
L’allocation de ressources est un problème omniprésent qui survient dès que des ressources limitées doivent être distribuées parmi de multiples agents autonomes (e.g., personnes, compagnies, robots, etc). Les approches standard pour déterminer l’allocation optimale souffrent généralement d’une très grande complexité de calcul. Le but de cette thèse est de proposer des algorithmes rapides et efficaces pour allouer des ressources consommables et non consommables à des agents autonomes dont les préférences sur ces ressources sont induites par un processus stochastique. Afin d’y parvenir, nous avons développé de nouveaux modèles pour des problèmes de planifications, basés sur le cadre des Processus Décisionnels de Markov (MDPs), où l’espace d’actions possibles est explicitement paramétrisés par les ressources disponibles. Muni de ce cadre, nous avons développé des algorithmes basés sur la programmation dynamique et la recherche heuristique en temps-réel afin de générer des allocations de ressources pour des agents qui agissent dans un environnement stochastique. En particulier, nous avons utilisé la propriété acyclique des créations de tâches pour décomposer le problème d’allocation de ressources. Nous avons aussi proposé une stratégie de décomposition approximative, où les agents considèrent des interactions positives et négatives ainsi que les actions simultanées entre les agents gérants les ressources. Cependant, la majeure contribution de cette thèse est l’adoption de la recherche heuristique en temps-réel pour l’allocation de ressources. À cet effet, nous avons développé une approche basée sur la Q-décomposition munie de bornes strictes afin de diminuer drastiquement le temps de planification pour formuler une politique optimale. Ces bornes strictes nous ont permis d’élaguer l’espace d’actions pour les agents. Nous montrons analytiquement et empiriquement que les approches proposées mènent à des diminutions de la complexité de calcul par rapport à des approches de planification standard. Finalement, nous avons testé la recherche heuristique en temps-réel dans le simulateur SADM, un simulateur d’allocation de ressource pour une frégate.Resource allocation is an ubiquitous problem that arises whenever limited resources have to be distributed among multiple autonomous entities (e.g., people, companies, robots, etc). The standard approaches to determine the optimal resource allocation are computationally prohibitive. The goal of this thesis is to propose computationally efficient algorithms for allocating consumable and non-consumable resources among autonomous agents whose preferences for these resources are induced by a stochastic process. Towards this end, we have developed new models of planning problems, based on the framework of Markov Decision Processes (MDPs), where the action sets are explicitly parameterized by the available resources. Given these models, we have designed algorithms based on dynamic programming and real-time heuristic search to formulating thus allocations of resources for agents evolving in stochastic environments. In particular, we have used the acyclic property of task creation to decompose the problem of resource allocation. We have also proposed an approximative decomposition strategy, where the agents consider positive and negative interactions as well as simultaneous actions among the agents managing the resources. However, the main contribution of this thesis is the adoption of stochastic real-time heuristic search for a resource allocation. To this end, we have developed an approach based on distributed Q-values with tight bounds to diminish drastically the planning time to formulate the optimal policy. These tight bounds enable to prune the action space for the agents. We show analytically and empirically that our proposed approaches lead to drastic (in many cases, exponential) improvements in computational efficiency over standard planning methods. Finally, we have tested real-time heuristic search in the SADM simulator, a simulator for the resource allocation of a platform
An approximate dynamic programming approach to the admission control of elective patients
In this paper, we propose an approximate dynamic programming (ADP) algorithm
to solve a Markov decision process (MDP) formulation for the admission control
of elective patients. To manage the elective patients from multiple specialties
equitably and efficiently, we establish a waiting list and assign each patient
a time-dependent dynamic priority score. Then, taking the random arrivals of
patients into account, sequential decisions are made on a weekly basis. At the
end of each week, we select the patients to be treated in the following week
from the waiting list. By minimizing the cost function of the MDP over an
infinite horizon, we seek to achieve the best trade-off between the patients'
waiting times and the over-utilization of surgical resources. Considering the
curses of dimensionality resulting from the large scale of realistically sized
problems, we first analyze the structural properties of the MDP and propose an
algorithm that facilitates the search for best actions. We then develop a novel
reinforcement-learning-based ADP algorithm as the solution technique.
Experimental results reveal that the proposed algorithms consume much less
computation time in comparison with that required by conventional dynamic
programming methods. Additionally, the algorithms are shown to be capable of
computing high-quality near-optimal policies for realistically sized problems
Mission-Phasing Techniques for Constrained Agents in Stochastic Environments.
Resource constraints restrict the set of actions that an agent can take, such that the agent might
not be able to perform all its desired tasks. Computational time limitations restrict the number of
states that an agent can model and reason over, such that the agent might not be able to formulate
a policy that can respond to all possible eventualities. This work argues that, in either
situation, one effective way of improving the agent's performance is to adopt a phasing strategy.
Resource-constrained agents can choose to reconfigure resources and switch action sets for handling
upcoming events better when moving from phase to phase; time-limited agents can choose to focus
computation on high-value phases and to exploit additional computation time during the execution of
earlier phases to improve solutions for future phases.
This dissertation consists of two parts, corresponding to the aforementioned resource constraints
and computational time limitations. The first part of the dissertation focuses on the development
of automated resource-driven mission-phasing techniques for agents operating in
resource-constrained environments. We designed a suite of algorithms which not only can find
solutions to optimize the use of predefined phase-switching points, but can also automatically
determine where to establish such points, accounting for the cost of creating them, in complex
stochastic environments. By formulating the coupled problems of mission decomposition, resource
configuration, and policy formulation into a single compact mathematical formulation, the presented
algorithms can effectively exploit problem structure and often considerably reduce computational
cost for finding exact solutions.
The second part of this dissertation is the design of computation-driven mission-phasing techniques
for time-critical systems. We developed a new deliberation scheduling approach, which can
simultaneously solve the coupled problems of deciding both when to deliberate given its cost, and
which phase decision procedures to execute during deliberation intervals. Meanwhile, we designed a
heuristic search method to effectively utilize the allocated time within each phase. As illustrated
in experimental results, the computation-driven mission-phasing techniques, which
extend problem decomposition techniques with the across-phase deliberation scheduling and
inner-phase heuristic search methods mentioned above, can help an agent generate a better
policy within time limit.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/60650/1/jianhuiw_1.pd
Recommended from our members
Domain-Independent Planning for Markov Decision Processes with Factored State and Action Spaces
Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research has focused on exact and approximate solutions to MDPs with factored states, assuming a small number of actions. In contrast to this, many applications are most naturally modeled as having factored actions described in terms of multiple action variables. In this thesis we study domain-independent algorithms that leverage the factored action structure in the MDP dynamics and reward, and scale better than treating each of the exponentially many joint actions as atomic. Our contributions are three-fold based on three fundamental approaches to MDP planning namely exact solution using symbolic dynamic programming (DP), anytime online planning using heuristic search and online action selection using hindsight optimization.
The first part is focused on deriving optimal policies over all states for MDPs whose state and action space are described in terms of multiple discrete random variables. In order to capture the factored action structure, we introduce new symbolic operators for computing DP updates over all states
efficiently by leveraging the abstract and symbolic representation of Decision Diagrams. Addressing the potential bottleneck of diagrammatic blowup in these operators we present a novel
and optimal policy iteration algorithm that emphasizes the diagrammatic compactness of the intermediate value functions and policies. The impact is seen in experiments on the well-studied problems of inventory control and system administration where our algorithm is able to exploit the increasing compactness of the optimal policy with increasing complexity of the action space.
Under the framework of anytime planning, the second part expands the scalability of our approach to factored actions by restricting its attention to the reachable part of the state space. Our contribution is the introduction of new symbolic generalization operators that guarantee a more moderate use of space and time while providing non-trivial generalization. These operators yield anytime algorithms that guarantee convergence to the optimal value and action for the current world state, while guaranteeing bounded growth in the size of the symbolic representation. We empirically show that our online algorithm is successfully able to combine forward search from an initial state with backwards generalized DP updates on symbolic states.
The third part considers a general class of hybrid (mixed discrete and continuous) state and action (HSA) MDPs. Whereas the insights from the above approaches are valid for hybrid MDPs as well, there are significant limitations inherent to the DP approach. Existing solvers for hybrid state and action MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight-line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs. In a concluding case study, we cast the real-time dispatch optimization problem faced by the Corvallis Fire Department as an HSA-MDP with factored actions. We show that our domain-independent planner significantly improves upon the responsiveness of the baseline that dispatches the nearest responders
Temporal Markov Decision Problems : Formalization and Resolution
This thesis addresses the question of planning under uncertainty within a time-dependent changing environment. Original motivation for this work came from the problem of building an autonomous agent able to coordinate with its
uncertain environment; this environment being composed of other agents communicating their intentions or non-controllable processes for which some discrete-event model is available. We investigate several approaches for modeling continuous time-dependency in the framework of Markov Decision Processes (MDPs), leading us to a definition of Temporal Markov Decision Problems. Then our approach focuses on two separate paradigms. First, we investigate time-dependent problems as \emph{implicit-event} processes and describe them through the formalism of Time-dependent MDPs (TMDPs). We extend the existing results concerning optimality equations and present a new Value Iteration algorithm based on piecewise polynomial function representations in order to solve a more general class of TMDPs. This paves the way to a more general discussion on parametric actions in hybrid state and action spaces MDPs with continuous time. In a second time, we investigate the
option of separately modeling the concurrent contributions of exogenous events. This approach of \emph{explicit-event} modeling leads to the use of Generalized Semi-Markov Decision Processes (GSMDP). We establish a link between the general framework of Discrete Events Systems Specification (DEVS) and the formalism of GSMDP, allowing us to build sound discrete-event compatible simulators. Then we introduce a simulation-based Policy Iteration approach for
explicit-event Temporal Markov Decision Problems. This algorithmic contribution brings together results from simulation theory, forward search in MDPs, and statistical learning theory. The implicit-event approach was tested on a
specific version of the Mars rover planning problem and on a drone patrol mission planning problem while the explicit-event approach was evaluated on a subway network control problem
Recommended from our members
Coordination for Scalable Multiple Robot Planning Under Temporal Uncertainty
This dissertation incorporates coalition formation and probabilistic planning towards a domain-independent automated planning solution scalable to multiple heterogeneous robots in complex domains. The first research direction investigates the effectiveness of Task Fusion and introduces heuristics that improve task allocation and result in better quality plans, while requiring lower computational cost than the baseline approaches. The heuristics incorporate relaxed plans to estimate coupling and determine which tasks to fuse. As a result, larger temporal continuous planning problems involving multiple robots can be solved. The second research direction introduces new coordination methods to merge plans and resolve conflicts while extending the framework to domains with stochastic action duration. Merging distributedly generated plans becomes computationally costly when task plans are tightly coupled, and conflicts arise due to dependencies between plan actions. Existing methods either scale poorly as the number of agents and tasks increases, or do not minimize makespan, the overall time necessary to execute all tasks. A new family of plan coordination and conflict resolution algorithms is introduced to merge independently generated plans, minimize the resulting makespan, and scale to a large number of tasks and agents in complex problems. A thorough algorithmic analysis and empirical evaluation demonstrates how the new conflict identification and resolution models can impact the resulting plan quality and computational cost across three heterogeneous multiagent domains and outperform the baseline algorithms