50,077 research outputs found
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
This paper presents the MAXQ approach to hierarchical reinforcement learning
based on decomposing the target Markov decision process (MDP) into a hierarchy
of smaller MDPs and decomposing the value function of the target MDP into an
additive combination of the value functions of the smaller MDPs. The paper
defines the MAXQ hierarchy, proves formal results on its representational
power, and establishes five conditions for the safe use of state abstractions.
The paper presents an online model-free learning algorithm, MAXQ-Q, and proves
that it converges wih probability 1 to a kind of locally-optimal policy known
as a recursively optimal policy, even in the presence of the five kinds of
state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q
through a series of experiments in three domains and shows experimentally that
MAXQ-Q (with state abstractions) converges to a recursively optimal policy much
faster than flat Q learning. The fact that MAXQ learns a representation of the
value function has an important benefit: it makes it possible to compute and
execute an improved, non-hierarchical policy via a procedure similar to the
policy improvement step of policy iteration. The paper demonstrates the
effectiveness of this non-hierarchical execution experimentally. Finally, the
paper concludes with a comparison to related work and a discussion of the
design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure
A Backward Analysis for Constraint Logic Programs
One recurring problem in program development is that of understanding how to
re-use code developed by a third party. In the context of (constraint) logic
programming, part of this problem reduces to figuring out how to query a
program. If the logic program does not come with any documentation, then the
programmer is forced to either experiment with queries in an ad hoc fashion or
trace the control-flow of the program (backward) to infer the modes in which a
predicate must be called so as to avoid an instantiation error. This paper
presents an abstract interpretation scheme that automates the latter technique.
The analysis presented in this paper can infer moding properties which if
satisfied by the initial query, come with the guarantee that the program and
query can never generate any moding or instantiation errors. Other applications
of the analysis are discussed. The paper explains how abstract domains with
certain computational properties (they condense) can be used to trace
control-flow backward (right-to-left) to infer useful properties of initial
queries. A correctness argument is presented and an implementation is reported.Comment: 32 page
Quadtrees as an Abstract Domain
Quadtrees have proved popular in computer graphics and spatial databases as a way of representing regions in two dimensional space. This hierarchical data-structure is flexible enough to support non-convex and even disconnected regions, therefore it is natural to ask whether this datastructure can form the basis of an abstract domain. This paper explores this question and suggests that quadtrees offer a new approach to weakly relational domains whilst their hierarchical structure naturally lends itself to representation with boolean functions
Contractual allocation of decision rights and incentives: The case of automobile distribution
We analyze empirically the allocation of rights and monetary incentives in automobile franchise contracts. These contracts substantially restrict the decision rights of dealers and grant manufacturers extensive contractual completion and enforcement powers, converting the manufacturers, de facto, in a sort of quasi-judiciary instance. Variation in the allocation of decision rights and incentive intensity is explained by the incidence of moral hazard in the relation. In particular, when the cost of dealer moral hazard is higher and the risk of manufacturer opportunism is lower, manufacturers enjoy more discretion in determining the performance required from their dealers and in using mechanisms such as monitoring, termination and monetary incentives to ensure such performance is provided. We also explore the existence of interdependencies between the different elements of the system. and find some complementarities between completion and termination rights, and between monitoring rights and the intensity of incentives.Franchising, contracts, self-enforcement, incentives, complementarities, automobiles
A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)
Many distributed systems work on a common shared state; in such systems,
distributed agreement is necessary for consistency. With an increasing number
of servers, these systems become more susceptible to single-server failures,
increasing the relevance of fault-tolerance. Atomic broadcast enables
fault-tolerant distributed agreement, yet it is costly to solve. Most practical
algorithms entail linear work per broadcast message. AllConcur -- a leaderless
approach -- reduces the work, by connecting the servers via a sparse resilient
overlay network; yet, this resiliency entails redundancy, limiting the
reduction of work. In this paper, we propose AllConcur+, an atomic broadcast
algorithm that lifts this limitation: During intervals with no failures, it
achieves minimal work by using a redundancy-free overlay network. When failures
do occur, it automatically recovers by switching to a resilient overlay
network. In our performance evaluation of non-failure scenarios, AllConcur+
achieves comparable throughput to AllGather -- a non-fault-tolerant distributed
agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in
terms of throughput and latency. Furthermore, our evaluation of failure
scenarios shows that AllConcur+'s expected performance is robust with regard to
occasional failures. Thus, for realistic use cases, leveraging redundancy-free
distributed agreement during intervals with no failures improves performance
significantly.Comment: Overview: 24 pages, 6 sections, 3 appendices, 8 figures, 3 tables.
Modifications from previous version: extended the evaluation of AllConcur+
with a simulation of a multiple datacenters deploymen
- âŚ