50,077 research outputs found

    Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

    Full text link
    This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure

    A Backward Analysis for Constraint Logic Programs

    Get PDF
    One recurring problem in program development is that of understanding how to re-use code developed by a third party. In the context of (constraint) logic programming, part of this problem reduces to figuring out how to query a program. If the logic program does not come with any documentation, then the programmer is forced to either experiment with queries in an ad hoc fashion or trace the control-flow of the program (backward) to infer the modes in which a predicate must be called so as to avoid an instantiation error. This paper presents an abstract interpretation scheme that automates the latter technique. The analysis presented in this paper can infer moding properties which if satisfied by the initial query, come with the guarantee that the program and query can never generate any moding or instantiation errors. Other applications of the analysis are discussed. The paper explains how abstract domains with certain computational properties (they condense) can be used to trace control-flow backward (right-to-left) to infer useful properties of initial queries. A correctness argument is presented and an implementation is reported.Comment: 32 page

    Quadtrees as an Abstract Domain

    Get PDF
    Quadtrees have proved popular in computer graphics and spatial databases as a way of representing regions in two dimensional space. This hierarchical data-structure is flexible enough to support non-convex and even disconnected regions, therefore it is natural to ask whether this datastructure can form the basis of an abstract domain. This paper explores this question and suggests that quadtrees offer a new approach to weakly relational domains whilst their hierarchical structure naturally lends itself to representation with boolean functions

    Terminations After World War I

    Get PDF

    Contractual allocation of decision rights and incentives: The case of automobile distribution

    Get PDF
    We analyze empirically the allocation of rights and monetary incentives in automobile franchise contracts. These contracts substantially restrict the decision rights of dealers and grant manufacturers extensive contractual completion and enforcement powers, converting the manufacturers, de facto, in a sort of quasi-judiciary instance. Variation in the allocation of decision rights and incentive intensity is explained by the incidence of moral hazard in the relation. In particular, when the cost of dealer moral hazard is higher and the risk of manufacturer opportunism is lower, manufacturers enjoy more discretion in determining the performance required from their dealers and in using mechanisms such as monitoring, termination and monetary incentives to ensure such performance is provided. We also explore the existence of interdependencies between the different elements of the system. and find some complementarities between completion and termination rights, and between monitoring rights and the intensity of incentives.Franchising, contracts, self-enforcement, incentives, complementarities, automobiles

    A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)

    Full text link
    Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, these systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail linear work per broadcast message. AllConcur -- a leaderless approach -- reduces the work, by connecting the servers via a sparse resilient overlay network; yet, this resiliency entails redundancy, limiting the reduction of work. In this paper, we propose AllConcur+, an atomic broadcast algorithm that lifts this limitation: During intervals with no failures, it achieves minimal work by using a redundancy-free overlay network. When failures do occur, it automatically recovers by switching to a resilient overlay network. In our performance evaluation of non-failure scenarios, AllConcur+ achieves comparable throughput to AllGather -- a non-fault-tolerant distributed agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in terms of throughput and latency. Furthermore, our evaluation of failure scenarios shows that AllConcur+'s expected performance is robust with regard to occasional failures. Thus, for realistic use cases, leveraging redundancy-free distributed agreement during intervals with no failures improves performance significantly.Comment: Overview: 24 pages, 6 sections, 3 appendices, 8 figures, 3 tables. Modifications from previous version: extended the evaluation of AllConcur+ with a simulation of a multiple datacenters deploymen

    Gardiner, Town of and United Public Service Employees Union (UPSEU), (2004) (MOA)

    Get PDF
    • …
    corecore