18,623 research outputs found
Composable Deep Reinforcement Learning for Robotic Manipulation
Model-free deep reinforcement learning has been shown to exhibit good
performance in domains ranging from video games to simulated robotic
manipulation and locomotion. However, model-free methods are known to perform
poorly when the interaction time with the environment is limited, as is the
case for most real-world robotic tasks. In this paper, we study how maximum
entropy policies trained using soft Q-learning can be applied to real-world
robotic manipulation. The application of this method to real-world manipulation
is facilitated by two important features of soft Q-learning. First, soft
Q-learning can learn multimodal exploration strategies by learning policies
represented by expressive energy-based models. Second, we show that policies
learned with soft Q-learning can be composed to create new policies, and that
the optimality of the resulting policy can be bounded in terms of the
divergence between the composed policies. This compositionality provides an
especially valuable tool for real-world manipulation, where constructing new
policies by composing existing skills can provide a large gain in efficiency
over training from scratch. Our experimental evaluation demonstrates that soft
Q-learning is substantially more sample efficient than prior model-free deep
reinforcement learning methods, and that compositionality can be performed for
both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies
Composing features by managing inconsistent requirements
One approach to system development is to decompose the requirements into features and specify the individual features before composing them. A major limitation of deferring feature composition is that inconsistency between the solutions to individual features may not be uncovered early in the development, leading to unwanted feature interactions. Syntactic inconsistencies arising from the way software artefacts are described can be addressed by the use of explicit, shared, domain knowledge. However, behavioural inconsistencies are more challenging: they may occur within the requirements associated with two or more features as well as at the level of individual features. Whilst approaches exist that address behavioural inconsistencies at design time, these are overrestrictive in ruling out all possible conflicts and may weaken the requirements further than is desirable. In this paper, we present a lightweight approach to dealing with behavioural inconsistencies at run-time. Requirement Composition operators are introduced that specify a run-time prioritisation to be used on occurrence of a feature interaction. This prioritisation can be static or dynamic. Dynamic prioritisation favours some requirement according to some run-time criterion, for example, the extent to which it is already generating behaviour
SNAP: Stateful Network-Wide Abstractions for Packet Processing
Early programming languages for software-defined networking (SDN) were built
on top of the simple match-action paradigm offered by OpenFlow 1.0. However,
emerging hardware and software switches offer much more sophisticated support
for persistent state in the data plane, without involving a central controller.
Nevertheless, managing stateful, distributed systems efficiently and correctly
is known to be one of the most challenging programming problems. To simplify
this new SDN problem, we introduce SNAP.
SNAP offers a simpler "centralized" stateful programming model, by allowing
programmers to develop programs on top of one big switch rather than many.
These programs may contain reads and writes to global, persistent arrays, and
as a result, programmers can implement a broad range of applications, from
stateful firewalls to fine-grained traffic monitoring. The SNAP compiler
relieves programmers of having to worry about how to distribute, place, and
optimize access to these stateful arrays by doing it all for them. More
specifically, the compiler discovers read/write dependencies between arrays and
translates one-big-switch programs into an efficient internal representation
based on a novel variant of binary decision diagrams. This internal
representation is used to construct a mixed-integer linear program, which
jointly optimizes the placement of state and the routing of traffic across the
underlying physical topology. We have implemented a prototype compiler and
applied it to about 20 SNAP programs over various topologies to demonstrate our
techniques' scalability
- …