41,428 research outputs found
Decoupled Learning of Environment Characteristics for Safe Exploration
Reinforcement learning is a proven technique for an agent to learn a task.
However, when learning a task using reinforcement learning, the agent cannot
distinguish the characteristics of the environment from those of the task. This
makes it harder to transfer skills between tasks in the same environment.
Furthermore, this does not reduce risk when training for a new task. In this
paper, we introduce an approach to decouple the environment characteristics
from the task-specific ones, allowing an agent to develop a sense of survival.
We evaluate our approach in an environment where an agent must learn a sequence
of collection tasks, and show that decoupled learning allows for a safer
utilization of prior knowledge.Comment: 4 pages, 4 figures, ICML 2017 workshop on Reliable Machine Learning
in the Wil
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Many problems in sequential decision making and stochastic control often have
natural multiscale structure: sub-tasks are assembled together to accomplish
complex goals. Systematically inferring and leveraging hierarchical structure,
particularly beyond a single level of abstraction, has remained a longstanding
challenge. We describe a fast multiscale procedure for repeatedly compressing,
or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of
sub-problems at different scales is automatically determined. Coarsened MDPs
are themselves independent, deterministic MDPs, and may be solved using
existing algorithms. The multiscale representation delivered by this procedure
decouples sub-tasks from each other and can lead to substantial improvements in
convergence rates both locally within sub-problems and globally across
sub-problems, yielding significant computational savings. A second fundamental
aspect of this work is that these multiscale decompositions yield new transfer
opportunities across different problems, where solutions of sub-tasks at
different levels of the hierarchy may be amenable to transfer to new problems.
Localized transfer of policies and potential operators at arbitrary scales is
emphasized. Finally, we demonstrate compression and transfer in a collection of
illustrative domains, including examples involving discrete and continuous
statespaces.Comment: 86 pages, 15 figure
Recommended from our members
Orbital Frontal Cortex Projections to Secondary Motor Cortex Mediate Exploitation of Learned Rules.
Animals face the dilemma between exploiting known opportunities and exploring new ones, a decision-making process supported by cortical circuits. While different types of learning may bias exploration, the circumstances and the degree to which bias occurs is unclear. We used an instrumental lever press task in mice to examine whether learned rules generalize to exploratory situations and the cortical circuits involved. We first trained mice to press one lever for food and subsequently assessed how that learning influenced pressing of a second novel lever. Using outcome devaluation procedures we found that novel lever exploration was not dependent on the food value associated with the trained lever. Further, changes in the temporal uncertainty of when a lever press would produce food did not affect exploration. Instead, accrued experience with the instrumental contingency was strongly predictive of test lever pressing with a positive correlation between experience and trained lever exploitation, but not novel lever exploration. Chemogenetic attenuation of orbital frontal cortex (OFC) projection into secondary motor cortex (M2) biased novel lever exploration, suggesting that experience increases OFC-M2 dependent exploitation of learned associations but leaves exploration constant. Our data suggests exploitation and exploration are parallel decision-making systems that do not necessarily compete
Practical Block-wise Neural Network Architecture Generation
Convolutional neural networks have gained a remarkable success in computer
vision. However, most usable network architectures are hand-crafted and usually
require expertise and elaborate design. In this paper, we provide a block-wise
network generation pipeline called BlockQNN which automatically builds
high-performance networks using the Q-Learning paradigm with epsilon-greedy
exploration strategy. The optimal network block is constructed by the learning
agent which is trained sequentially to choose component layers. We stack the
block to construct the whole auto-generated network. To accelerate the
generation process, we also propose a distributed asynchronous framework and an
early stop strategy. The block-wise generation brings unique advantages: (1) it
performs competitive results in comparison to the hand-crafted state-of-the-art
networks on image classification, additionally, the best network generated by
BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing
auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of
the search space in designing networks which only spends 3 days with 32 GPUs,
and (3) moreover, it has strong generalizability that the network built on
CIFAR also performs well on a larger-scale ImageNet dataset.Comment: Accepted to CVPR 201
- …