5 research outputs found
(m,n)-Semirings and a Generalized Fault Tolerance Algebra of Systems
We propose a new class of mathematical structures called (m,n)-semirings}
(which generalize the usual semirings), and describe their basic properties. We
also define partial ordering, and generalize the concepts of congruence,
homomorphism, ideals, etc., for (m,n)-semirings. Following earlier work by Rao,
we consider a system as made up of several components whose failures may cause
it to fail, and represent the set of systems algebraically as an
(m,n)-semiring. Based on the characteristics of these components we present a
formalism to compare the fault tolerance behaviour of two systems using our
framework of a partially ordered (m,n)-semiring.Comment: 26 pages; extension of arXiv:0907.3194v1 [math.GM
Robust Temporal Difference Learning for Critical Domains
We present a new Q-function operator for temporal difference (TD) learning
methods that explicitly encodes robustness against significant rare events
(SRE) in critical domains. The operator, which we call the -operator,
allows to learn a robust policy in a model-based fashion without actually
observing the SRE. We introduce single- and multi-agent robust TD methods using
the operator . We prove convergence of the operator to the optimal
robust Q-function with respect to the model using the theory of Generalized
Markov Decision Processes. In addition we prove convergence to the optimal
Q-function of the original MDP given that the probability of SREs vanishes.
Empirical evaluations demonstrate the superior performance of -based TD
methods both in the early learning phase as well as in the final converged
stage. In addition we show robustness of the proposed method to small model
errors, as well as its applicability in a multi-agent context.Comment: AAMAS 201