2,180 research outputs found
Safety-Aware Apprenticeship Learning
Apprenticeship learning (AL) is a kind of Learning from Demonstration
techniques where the reward function of a Markov Decision Process (MDP) is
unknown to the learning agent and the agent has to derive a good policy by
observing an expert's demonstrations. In this paper, we study the problem of
how to make AL algorithms inherently safe while still meeting its learning
objective. We consider a setting where the unknown reward function is assumed
to be a linear combination of a set of state features, and the safety property
is specified in Probabilistic Computation Tree Logic (PCTL). By embedding
probabilistic model checking inside AL, we propose a novel
counterexample-guided approach that can ensure safety while retaining
performance of the learnt policy. We demonstrate the effectiveness of our
approach on several challenging AL scenarios where safety is essential.Comment: Accepted by International Conference on Computer Aided Verification
(CAV) 201
Quantitative Safety: Linking Proof-Based Verification with Model Checking for Probabilistic Systems
This paper presents a novel approach for augmenting proof-based verification
with performance-style analysis of the kind employed in state-of-the-art model
checking tools for probabilistic systems. Quantitative safety properties
usually specified as probabilistic system invariants and modeled in proof-based
environments are evaluated using bounded model checking techniques.
Our specific contributions include the statement of a theorem that is central
to model checking safety properties of proof-based systems, the establishment
of a procedure; and its full implementation in a prototype system (YAGA) which
readily transforms a probabilistic model specified in a proof-based environment
to its equivalent verifiable PRISM model equipped with reward structures. The
reward structures capture the exact interpretation of the probabilistic
invariants and can reveal succinct information about the model during
experimental investigations. Finally, we demonstrate the novelty of the
technique on a probabilistic library case study
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
In this paper we introduce the idea of improving the performance of
parametric temporal-difference (TD) learning algorithms by selectively
emphasizing or de-emphasizing their updates on different time steps. In
particular, we show that varying the emphasis of linear TD()'s updates
in a particular way causes its expected update to become stable under
off-policy training. The only prior model-free TD methods to achieve this with
per-step computation linear in the number of function approximation parameters
are the gradient-TD family of methods including TDC, GTD(), and
GQ(). Compared to these methods, our _emphatic TD()_ is
simpler and easier to use; it has only one learned parameter vector and one
step-size parameter. Our treatment includes general state-dependent discounting
and bootstrapping functions, and a way of specifying varying degrees of
interest in accurately valuing different states.Comment: 29 pages This is a significant revision based on the first set of
reviews. The most important change was to signal early that the main result
is about stability, not convergenc
Counterexample Generation in Probabilistic Model Checking
Providing evidence for the refutation of a property is an essential, if not the most important, feature of model checking. This paper considers algorithms for counterexample generation for probabilistic CTL formulae in discrete-time Markov chains. Finding the strongest evidence (i.e., the most probable path) violating a (bounded) until-formula is shown to be reducible to a single-source (hop-constrained) shortest path problem. Counterexamples of smallest size that deviate most from the required probability bound can be obtained by applying (small amendments to) k-shortest (hop-constrained) paths algorithms. These results can be extended to Markov chains with rewards, to LTL model checking, and are useful for Markov decision processes. Experimental results show that typically the size of a counterexample is excessive. To obtain much more compact representations, we present a simple algorithm to generate (minimal) regular expressions that can act as counterexamples. The feasibility of our approach is illustrated by means of two communication protocols: leader election in an anonymous ring network and the Crowds protocol
Human Motivation in Thomas Reid
According to Reid (1969, 283) motives are an ens
rationis. Because of that they may influence to action, but
they do not act as causes or as agents, that is motives are
only advisory (cf. Seebaß 1993, 329; Lehrer 1989, 210).
Instead motives presuppose an efficient cause, namely an
agent (cf. Rowe 1991, chapter 4), and the agent"s freedom
(Reid 1969, 284). In opposition to Leibniz (1994, 84-85)
who defends subtle reasons Reid (1969) claims that
motives have to be conscious (cf. Seebaß 1993, 269). For
to "be influenced by a motive of which I am not conscious,
is, ..., an arbitrary supposition without any evidence, ... ."
(Reid 1969, 285
Safety-aware apprenticeship learning
It is well acknowledged in the AI community that finding a good reward function for reinforcement learning is extremely challenging. Apprenticeship learning (AL) is a class of “learning from demonstration” techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent uses inverse reinforcement learning (IRL) methods to recover expert policy from a set of expert demonstrations. However, as the agent learns exclusively from observations, given a constraint on the probability of the agent running into unwanted situations, there is no verification, nor guarantee, for the learnt policy on the satisfaction of the restriction. In this dissertation, we study the problem of how to guide AL to learn a policy that is inherently safe while still meeting its learning objective. By combining formal methods with imitation learning, a Counterexample-Guided Apprenticeship Learning algorithm is proposed. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. This algorithm guarantees that given some formal safety specification defined by probabilistic temporal logic, the learnt policy shall satisfy this specification. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential
- …