4 research outputs found
Lexicographic refinements in possibilistic decision trees and finite-horizon Markov decision processes
Possibilistic decision theory has been proposed twenty years ago and has had several extensions since then. Even though ap-pealing for its ability to handle qualitative decision problems, possibilisticdecision theory suffers from an important drawback. Qualitative possibilistic utility criteria compare acts through min and max operators, which leads to a drowning effect. To over-come this lack of decision power of the theory, several refinements have been proposed. Lexicographic refinements are particularly appealing since they allow to benefit from the Expected Utility background, while remaining qualitative. This article aims at extend-ing lexicographic refinements to sequential decision problems i.e., to possibilistic decision trees and possibilistic Markov decision processes, when the horizon is finite. We present two criteria that refine qualitative possibilistic utilities and provide dynamic programming algorithms for calculating lexicographically optimal policies
Lexicographic refinements in possibilistic sequential decision-making models
Ce travail contribue à la théorie de la décision possibiliste et plus précisément à la prise de décision séquentielle dans le cadre de la théorie des possibilités, à la
fois au niveau théorique et pratique. Bien qu'attrayante pour sa capacité à résoudre les problèmes de décision qualitatifs, la théorie de la décision possibiliste souffre d'un
inconvénient important : les critères d'utilité qualitatives possibilistes comparent les actions avec les opérateurs min et max, ce qui entraîne un effet de noyade. Pour
surmonter ce manque de pouvoir décisionnel, plusieurs raffinements ont été proposés dans la littérature. Les raffinements lexicographiques sont particulièrement intéressants
puisqu'ils permettent de bénéficier de l'arrière-plan de l'utilité espérée, tout en restant "qualitatifs". Cependant, ces raffinements ne sont définis que pour les problèmes de
décision non séquentiels.
Dans cette thèse, nous présentons des résultats sur l'extension des raffinements lexicographiques aux problèmes de décision séquentiels, en particulier aux Arbres de
Décision et aux Processus Décisionnels de Markov possibilistes. Cela aboutit à des nouveaux algorithmes de planification plus "décisifs" que leurs contreparties possibilistes.
Dans un premier temps, nous présentons des relations de préférence lexicographiques optimistes et pessimistes entre les politiques avec et sans utilités intermédiaires, qui
raffinent respectivement les utilités possibilistes optimistes et pessimistes. Nous prouvons que les critères proposés satisfont le principe de l'efficacité de Pareto ainsi que
la propriété de monotonie stricte. Cette dernière garantit la possibilité d'application d'un algorithme de programmation dynamique pour calculer des politiques optimales. Nous
Ă©tudions tout d'abord l'optimisation lexicographique des politiques dans les Arbres de DĂ©cision possibilistes et les Processus DĂ©cisionnels de Markov Ă horizon fini. Nous
fournissons des adaptations de l'algorithme de programmation dynamique qui calculent une politique optimale en temps polynomial. Ces algorithmes sont basés sur la comparaison
lexicographique des matrices de trajectoires associées aux sous-politiques. Ce travail algorithmique est complété par une étude expérimentale qui montre la faisabilité et
l'intérêt de l'approche proposée. Ensuite, nous prouvons que les critères lexicographiques bénéficient toujours d'une fondation en termes d'utilité espérée, et qu'ils peuvent
être capturés par des utilités espérées infinitésimales.
La dernière partie de notre travail est consacrée à l'optimisation des politiques dans les Processus Décisionnels de Markov (éventuellement infinis) stationnaires. Nous
proposons un algorithme d'itération de la valeur pour le calcul des politiques optimales lexicographiques. De plus, nous étendons ces résultats au cas de l'horizon infini. La
taille des matrices augmentant exponentiellement (ce qui est particulièrement problématique dans le cas de l'horizon infini), nous proposons un algorithme d'approximation qui se
limite à la partie la plus intéressante de chaque matrice de trajectoires, à savoir les premières lignes et colonnes. Enfin, nous rapportons des résultats expérimentaux qui
prouvent l'efficacité des algorithmes basés sur la troncation des matrices.This work contributes to possibilistic decision theory and more specifically to sequential decision-making under possibilistic uncertainty, at both the theoretical and
practical levels. Even though appealing for its ability to handle qualitative decision problems, possibilisitic decision theory suffers from an important drawback: qualitative
possibilistic utility criteria compare acts through min and max operators, which leads to a drowning effect. To overcome this lack of decision power, several refinements have
been proposed in the literature. Lexicographic refinements are particularly appealing since they allow to benefit from the expected utility background, while remaining
"qualitative". However, these refinements are defined for the non-sequential decision problems only.
In this thesis, we present results on the extension of the lexicographic preference relations to sequential decision problems, in particular, to possibilistic Decision
trees and Markov Decision Processes. This leads to new planning algorithms that are more "decisive" than their original possibilistic counterparts. We first present optimistic
and pessimistic lexicographic preference relations between policies with and without intermediate utilities that refine the optimistic and pessimistic qualitative utilities
respectively. We prove that these new proposed criteria satisfy the principle of Pareto efficiency as well as the property of strict monotonicity. This latter guarantees that
dynamic programming algorithm can be used for calculating lexicographic optimal policies. Considering the problem of policy optimization in possibilistic decision trees and
finite-horizon Markov decision processes, we provide adaptations of dynamic programming algorithm that calculate lexicographic optimal policy in polynomial time. These
algorithms are based on the lexicographic comparison of the matrices of trajectories associated to the sub-policies. This algorithmic work is completed with an experimental
study that shows the feasibility and the interest of the proposed approach. Then we prove that the lexicographic criteria still benefit from an Expected Utility grounding, and
can be represented by infinitesimal expected utilities.
The last part of our work is devoted to policy optimization in (possibly infinite) stationary Markov Decision Processes. We propose a value iteration algorithm for the
computation of lexicographic optimal policies. We extend these results to the infinite-horizon case. Since the size of the matrices increases exponentially (which is especially
problematic in the infinite-horizon case), we thus propose an approximation algorithm which keeps the most interesting part of each matrix of trajectories, namely the first
lines and columns. Finally, we reports experimental results that show the effectiveness of the algorithms based on the cutting of the matrices
Lexicographic refinements in possibilistic decision trees
International audiencePossibilistic decision theory has been proposed twenty years ago and has had several extensions since then. Because of the lack of decision power of possibilistic decision theory, several refinements have then been proposed. Unfortunately, these refinements do not allow to circumvent the difficulty when the decision problem is sequential. In this article, we propose to extend lexicographic refinements to possibilistic decision trees. We show, in particular, that they still benefit from an Expected Utility (EU) grounding. We also provide qualitative dynamic programming algorithms to compute lexicographic optimal strategies. The paper is completed with an experimental study that shows the feasibility and the interest of the approach
Lexicographic refinements in possibilistic decision trees and finite-horizon Markov decision processes
International audiencePossibilistic decision theory has been proposed twenty years ago and has had several extensions since then. Even though ap-pealing for its ability to handle qualitative decision problems, possibilisticdecision theory suffers from an important drawback. Qualitative possibilistic utility criteria compare acts through min and max operators, which leads to a drowning effect. To over-come this lack of decision power of the theory, several refinements have been proposed. Lexicographic refinements are particularly appealing since they allow to benefit from the Expected Utility background, while remaining qualitative. This article aims at extend-ing lexicographic refinements to sequential decision problems i.e., to possibilistic decision trees and possibilistic Markov decision processes, when the horizon is finite. We present two criteria that refine qualitative possibilistic utilities and provide dynamic programming algorithms for calculating lexicographically optimal policies