3,573 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Answer Set Planning Under Action Costs
Recently, planning based on answer set programming has been proposed as an
approach towards realizing declarative planning systems. In this paper, we
present the language Kc, which extends the declarative planning language K by
action costs. Kc provides the notion of admissible and optimal plans, which are
plans whose overall action costs are within a given limit resp. minimum over
all plans (i.e., cheapest plans). As we demonstrate, this novel language allows
for expressing some nontrivial planning tasks in a declarative way.
Furthermore, it can be utilized for representing planning problems under other
optimality criteria, such as computing ``shortest'' plans (with the least
number of steps), and refinement combinations of cheapest and fastest plans. We
study complexity aspects of the language Kc and provide a transformation to
logic programs, such that planning problems are solved via answer set
programming. Furthermore, we report experimental results on selected problems.
Our experience is encouraging that answer set planning may be a valuable
approach to expressive planning systems in which intricate planning problems
can be naturally specified and solved
Methods and algorithms for integrated multi-scale optimisation of production planning and scheduling
Imperial Users onl
Techniques for the allocation of resources under uncertainty
L’allocation de ressources est un problème omniprésent qui survient dès que des ressources limitées doivent être distribuées parmi de multiples agents autonomes (e.g., personnes, compagnies, robots, etc). Les approches standard pour déterminer l’allocation optimale souffrent généralement d’une très grande complexité de calcul. Le but de cette thèse est de proposer des algorithmes rapides et efficaces pour allouer des ressources consommables et non consommables à des agents autonomes dont les préférences sur ces ressources sont induites par un processus stochastique. Afin d’y parvenir, nous avons développé de nouveaux modèles pour des problèmes de planifications, basés sur le cadre des Processus Décisionnels de Markov (MDPs), où l’espace d’actions possibles est explicitement paramétrisés par les ressources disponibles. Muni de ce cadre, nous avons développé des algorithmes basés sur la programmation dynamique et la recherche heuristique en temps-réel afin de générer des allocations de ressources pour des agents qui agissent dans un environnement stochastique. En particulier, nous avons utilisé la propriété acyclique des créations de tâches pour décomposer le problème d’allocation de ressources. Nous avons aussi proposé une stratégie de décomposition approximative, où les agents considèrent des interactions positives et négatives ainsi que les actions simultanées entre les agents gérants les ressources. Cependant, la majeure contribution de cette thèse est l’adoption de la recherche heuristique en temps-réel pour l’allocation de ressources. À cet effet, nous avons développé une approche basée sur la Q-décomposition munie de bornes strictes afin de diminuer drastiquement le temps de planification pour formuler une politique optimale. Ces bornes strictes nous ont permis d’élaguer l’espace d’actions pour les agents. Nous montrons analytiquement et empiriquement que les approches proposées mènent à des diminutions de la complexité de calcul par rapport à des approches de planification standard. Finalement, nous avons testé la recherche heuristique en temps-réel dans le simulateur SADM, un simulateur d’allocation de ressource pour une frégate.Resource allocation is an ubiquitous problem that arises whenever limited resources have to be distributed among multiple autonomous entities (e.g., people, companies, robots, etc). The standard approaches to determine the optimal resource allocation are computationally prohibitive. The goal of this thesis is to propose computationally efficient algorithms for allocating consumable and non-consumable resources among autonomous agents whose preferences for these resources are induced by a stochastic process. Towards this end, we have developed new models of planning problems, based on the framework of Markov Decision Processes (MDPs), where the action sets are explicitly parameterized by the available resources. Given these models, we have designed algorithms based on dynamic programming and real-time heuristic search to formulating thus allocations of resources for agents evolving in stochastic environments. In particular, we have used the acyclic property of task creation to decompose the problem of resource allocation. We have also proposed an approximative decomposition strategy, where the agents consider positive and negative interactions as well as simultaneous actions among the agents managing the resources. However, the main contribution of this thesis is the adoption of stochastic real-time heuristic search for a resource allocation. To this end, we have developed an approach based on distributed Q-values with tight bounds to diminish drastically the planning time to formulate the optimal policy. These tight bounds enable to prune the action space for the agents. We show analytically and empirically that our proposed approaches lead to drastic (in many cases, exponential) improvements in computational efficiency over standard planning methods. Finally, we have tested real-time heuristic search in the SADM simulator, a simulator for the resource allocation of a platform
Personaneinsatz- und Tourenplanung fĂĽr Mitarbeiter mit Mehrfachqualifikationen
In workforce routing and scheduling there are many applications in which differently skilled workers must perform jobs that occur at different locations, where each job requires a particular combination of skills. In many such applications, a group of workers must be sent out to provide all skills required by a job. Examples are found in maintenance operations, the construction sector, health care operations, or consultancies. In this thesis, we analyze the combined problem of composing worker groups (teams) and routing these teams under goals expressing service-, fairness-, and cost-objectives. We develop mathematical optimization models and heuristic solution methods for an integrated solution and a sequential solution of the teaming- and routing-subproblems . Computational experiments are conducted to identify the tradeoff of better solution quality and computational effort
- …