71,070 research outputs found
Asymmetric Actor Critic for Image-Based Robot Learning
Deep reinforcement learning (RL) has proven a powerful technique in many
sequential decision making domains. However, Robotics poses many challenges for
RL, most notably training on a physical system can be expensive and dangerous,
which has sparked significant interest in learning control policies using a
physics simulator. While several recent works have shown promising results in
transferring policies trained in simulation to the real world, they often do
not fully utilize the advantage of working with a simulator. In this work, we
exploit the full state observability in the simulator to train better policies
which take as input only partial observations (RGBD images). We do this by
employing an actor-critic training algorithm in which the critic is trained on
full states while the actor (or policy) gets rendered images as input. We show
experimentally on a range of simulated tasks that using these asymmetric inputs
significantly improves performance. Finally, we combine this method with domain
randomization and show real robot experiments for several tasks like picking,
pushing, and moving a block. We achieve this simulation to real world transfer
without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT
Recommended from our members
Linking students' timing of engagement to learning design and academic performance
In recent years, the connection between Learning Design (LD) and Learning Analytics (LA) has been emphasized by many scholars as it could enhance our interpretation of LA findings and translate them to meaningful interventions. Together with numerous conceptual studies, a gradual accumulation of empirical evidence has indicated a strong connection between how instructors design for learning and student behaviour. Nonetheless, students' timing of engagement and its relation to LD and academic performance have received limited attention. Therefore, this study investigates to what extent students' timing of engagement aligned with instructor learning design, and how engagement varied across different levels of performance. The analysis was conducted over 28 weeks using trace data, on 387 students, and replicated over two semesters in 2015 and 2016. Our findings revealed a mismatch between how instructors designed for learning and how students studied in reality. In most weeks, students spent less time studying the assigned materials on the VLE compared to the number of hours recommended by instructors. The timing of engagement also varied, from in advance to catching up patterns. High-performing students spent more time studying in advance, while low-performing students spent a higher proportion of their time on catching-up activities. This study reinforced the importance of pedagogical context to transform analytics into actionable insights
Oracles and query lower bounds in generalised probabilistic theories
We investigate the connection between interference and computational power
within the operationally defined framework of generalised probabilistic
theories. To compare the computational abilities of different theories within
this framework we show that any theory satisfying three natural physical
principles possess a well-defined oracle model. Indeed, we prove a subroutine
theorem for oracles in such theories which is a necessary condition for the
oracle to be well-defined. The three principles are: causality (roughly, no
signalling from the future), purification (each mixed state arises as the
marginal of a pure state of a larger system), and strong symmetry existence of
non-trivial reversible transformations). Sorkin has defined a hierarchy of
conceivable interference behaviours, where the order in the hierarchy
corresponds to the number of paths that have an irreducible interaction in a
multi-slit experiment. Given our oracle model, we show that if a classical
computer requires at least n queries to solve a learning problem, then the
corresponding lower bound in theories lying at the kth level of Sorkin's
hierarchy is n/k. Hence, lower bounds on the number of queries to a quantum
oracle needed to solve certain problems are not optimal in the space of all
generalised probabilistic theories, although it is not yet known whether the
optimal bounds are achievable in general. Hence searches for higher-order
interference are not only foundationally motivated, but constitute a search for
a computational resource beyond that offered by quantum computation.Comment: 17+7 pages. Comments Welcome. Published in special issue
"Foundational Aspects of Quantum Information" in Foundations of Physic
Minimisation of Multiplicity Tree Automata
We consider the problem of minimising the number of states in a multiplicity
tree automaton over the field of rational numbers. We give a minimisation
algorithm that runs in polynomial time assuming unit-cost arithmetic. We also
show that a polynomial bound in the standard Turing model would require a
breakthrough in the complexity of polynomial identity testing by proving that
the latter problem is logspace equivalent to the decision version of
minimisation. The developed techniques also improve the state of the art in
multiplicity word automata: we give an NC algorithm for minimising multiplicity
word automata. Finally, we consider the minimal consistency problem: does there
exist an automaton with states that is consistent with a given finite
sample of weight-labelled words or trees? We show that this decision problem is
complete for the existential theory of the rationals, both for words and for
trees of a fixed alphabet rank.Comment: Paper to be published in Logical Methods in Computer Science. Minor
editing changes from previous versio
- ā¦