1,027 research outputs found

    Deep Residual Reinforcement Learning

    Full text link
    We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD(kk) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.Comment: AAMAS 202

    End-to-end Driving via Conditional Imitation Learning

    Get PDF
    Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at https://youtu.be/cFtnflNe5fMComment: Published at the International Conference on Robotics and Automation (ICRA), 201

    A note on compositional refinement

    Get PDF

    A note on compositional refinement

    Get PDF

    Automated Synthesis of Timed and Distributed Fault-Tolerant Systems

    Get PDF
    This dissertation concentrates on the problem of automated synthesis and repair of fault-tolerant systems. In particular, given the required specification of the system, our goal is to synthesize a fault-tolerant system, or repair an existing one. We study this problem for two classes of timed and distributed systems. In the context of timed systems, we focus on efficient synthesis of fault-tolerant timed models from their fault-intolerant version. Although the complexity of the synthesis problem is known to be polynomial time in the size of the time-abstract bisimulation of the input model, the state of the art lacked synthesis algorithms that can be efficiently implemented. This is in part due to the fact that synthesis is in general a challenging problem and its complexity is significantly magnified in the context of timed systems. We propose an algorithm that takes a timed automaton, a set of fault actions, and a set of safety and bounded-time response properties as input, and utilizes a space-efficient symbolic representation of the timed automaton (called the zone graph) to synthesize a fault-tolerant timed automaton as output. The output automaton satisfies strict phased recovery, where it is guaranteed that the output model behaves similarly to the input model in the absence of faults and in the presence of faults, fault recovery is achieved in two phases, each satisfying certain safety and timing constraints. In the context of distributed systems, we study the problem of synthesizing fault-tolerant systems from their intolerant versions, when the number of processes is unknown. To synthesize a distributed fault-tolerant protocol that works for systems with any number of processes, we use counter abstraction. Using this abstraction, we deal with a finite-state abstract model to do the synthesis. Applying our proposed algorithm, we successfully synthesized a fault-tolerant distributed agreement protocol in the presence of Byzantine fault. Although the synthesis problem is known to be NP-complete in the state space of the input protocol (due to partial observability of processes) in the non-parameterized setting, our parameterized algorithm manages to synthesize a solution for a complex problem such as Byzantine agreement within less than two minutes. A system may reach a bad state due to wrong initialization or fault occurrence. One of the well-known types of distributed fault-tolerant systems are self-stabilizing systems. These are the systems that converge to their legitimate states starting from any state, and if no fault occurs, stay in legitimate states thereafter. We propose an automated sound and complete method to synthesize self-stabilizing systems starting from the desired topology and type of the system. Our proposed method is based on SMT-solving, where the desired specification of the system is formulated as SMT constraints. We used the Alloy solver to implement our method, and successfully synthesized some of the well-known self-stabilizing algorithms. We extend our method to support a type of stabilizing algorithm called ideal-stabilization, and also the case when the set of legitimate states is not explicitly known. Quantitative metrics such as recovery time are crucial in self-stabilizing systems when used in practice (such as in networking applications). One of these metrics is the average recovery time. Our automated method for synthesizing self-stabilizing systems generate some solution that respects the desired system specification, but it does not take into account any quantitative metrics. We study the problem of repairing self-stabilizing systems (where only removal of transitions is allowed) to satisfy quantitative limitations. The metric under study is average recovery time, which characterizes the performance of stabilizing programs. We show that the repair problem is NP-complete in the state space of the given system

    Constraint and Restoring Force

    Get PDF
    Long-lived sensor network applications must be able to self-repair and adapt to changing demands. We introduce a new approach for doing so: Constraint and Restoring Force. CRF is a physics-inspired framework for computing scalar fields across a sensor network with occasional changes. We illustrate CRFs usefulness by applying it to gradients, a common building block for sensor network systems. The resulting algorithm, CRF-Gradient, determines locally when to self-repair and when to stop and save energy. CRF-Gradient is self-stabilizing, converges in O(diameter) time, and has been verified experimentally in simulation and on a network of Mica2 motes. Finally we show how CRF can be applied to other algorithms as well, such as the calculation of probability fields

    Structural Invariants for the Verification of Systems with Parameterized Architectures

    Full text link
    We consider parameterized concurrent systems consisting of a finite but unknown number of components, obtained by replicating a given set of finite state automata. Components communicate by executing atomic interactions whose participants update their states simultaneously. We introduce an interaction logic to specify both the type of interactions (e.g.\ rendez-vous, broadcast) and the topology of the system (e.g.\ pipeline, ring). The logic can be easily embedded in monadic second order logic of finitely many successors, and is therefore decidable. Proving safety properties of such a parameterized system, like deadlock freedom or mutual exclusion, requires to infer an inductive invariant that contains all reachable states of all system instances, and no unsafe state. We present a method to automatically synthesize inductive invariants directly from the formula describing the interactions, without costly fixed point iterations. We experimentally prove that this invariant is strong enough to verify safety properties of a large number of systems including textbook examples (dining philosophers, synchronization schemes), classical mutual exclusion algorithms, cache-coherence protocols and self-stabilization algorithms, for an arbitrary number of components.Comment: preprint; to be published in the proceedings of TACAS2
    • …
    corecore