18,623 research outputs found

    Composable Deep Reinforcement Learning for Robotic Manipulation

    Full text link
    Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies

    Composing features by managing inconsistent requirements

    Get PDF
    One approach to system development is to decompose the requirements into features and specify the individual features before composing them. A major limitation of deferring feature composition is that inconsistency between the solutions to individual features may not be uncovered early in the development, leading to unwanted feature interactions. Syntactic inconsistencies arising from the way software artefacts are described can be addressed by the use of explicit, shared, domain knowledge. However, behavioural inconsistencies are more challenging: they may occur within the requirements associated with two or more features as well as at the level of individual features. Whilst approaches exist that address behavioural inconsistencies at design time, these are overrestrictive in ruling out all possible conflicts and may weaken the requirements further than is desirable. In this paper, we present a lightweight approach to dealing with behavioural inconsistencies at run-time. Requirement Composition operators are introduced that specify a run-time prioritisation to be used on occurrence of a feature interaction. This prioritisation can be static or dynamic. Dynamic prioritisation favours some requirement according to some run-time criterion, for example, the extent to which it is already generating behaviour

    SNAP: Stateful Network-Wide Abstractions for Packet Processing

    Full text link
    Early programming languages for software-defined networking (SDN) were built on top of the simple match-action paradigm offered by OpenFlow 1.0. However, emerging hardware and software switches offer much more sophisticated support for persistent state in the data plane, without involving a central controller. Nevertheless, managing stateful, distributed systems efficiently and correctly is known to be one of the most challenging programming problems. To simplify this new SDN problem, we introduce SNAP. SNAP offers a simpler "centralized" stateful programming model, by allowing programmers to develop programs on top of one big switch rather than many. These programs may contain reads and writes to global, persistent arrays, and as a result, programmers can implement a broad range of applications, from stateful firewalls to fine-grained traffic monitoring. The SNAP compiler relieves programmers of having to worry about how to distribute, place, and optimize access to these stateful arrays by doing it all for them. More specifically, the compiler discovers read/write dependencies between arrays and translates one-big-switch programs into an efficient internal representation based on a novel variant of binary decision diagrams. This internal representation is used to construct a mixed-integer linear program, which jointly optimizes the placement of state and the routing of traffic across the underlying physical topology. We have implemented a prototype compiler and applied it to about 20 SNAP programs over various topologies to demonstrate our techniques' scalability
    • …
    corecore