5,425 research outputs found
Outline Objects using Deep Reinforcement Learning
Image segmentation needs both local boundary position information and global
object context information. The performance of the recent state-of-the-art
method, fully convolutional networks, reaches a bottleneck due to the neural
network limit after balancing between the two types of information
simultaneously in an end-to-end training style. To overcome this problem, we
divide the semantic image segmentation into temporal subtasks. First, we find a
possible pixel position of some object boundary; then trace the boundary at
steps within a limited length until the whole object is outlined. We present
the first deep reinforcement learning approach to semantic image segmentation,
called DeepOutline, which outperforms other algorithms in Coco detection
leaderboard in the middle and large size person category in Coco val2017
dataset. Meanwhile, it provides an insight into a divide and conquer way by
reinforcement learning on computer vision problems
Hacking Google reCAPTCHA v3 using Reinforcement Learning
We present a Reinforcement Learning (RL) methodology to bypass Google
reCAPTCHA v3. We formulate the problem as a grid world where the agent learns
how to move the mouse and click on the reCAPTCHA button to receive a high
score. We study the performance of the agent when we vary the cell size of the
grid world and show that the performance drops when the agent takes big steps
toward the goal. Finally, we used a divide and conquer strategy to defeat the
reCAPTCHA system for any grid resolution. Our proposed method achieves a
success rate of 97.4% on a 100x100 grid and 96.7% on a 1000x1000 screen
resolution.Comment: Accepted for the Conference on Reinforcement Learning and Decision
Making (RLDM) 201
Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks
Recent developments in deep reinforcement learning have enabled the creation
of agents for solving a large variety of games given a visual input. These
methods have been proven successful for 2D games, like the Atari games, or for
simple tasks, like navigating in mazes. It is still an open question, how to
address more complex environments, in which the reward is sparse and the state
space is huge. In this paper we propose a divide and conquer deep reinforcement
learning solution and we test our agent in the first person shooter (FPS) game
of Doom. Our work is based on previous works in deep reinforcement learning and
in Doom agents. We also present how our agent is able to perform better in
unknown environments compared to a state of the art reinforcement learning
algorithm.Comment: 4 pages, 3 figures, 3 table
Subgoal Discovery for Hierarchical Dialogue Policy Learning
Developing agents to engage in complex goal-oriented dialogues is challenging
partly because the main learning signals are very sparse in long conversations.
In this paper, we propose a divide-and-conquer approach that discovers and
exploits the hidden structure of the task to enable efficient policy learning.
First, given successful example dialogues, we propose the Subgoal Discovery
Network (SDN) to divide a complex goal-oriented task into a set of simpler
subgoals in an unsupervised fashion. We then use these subgoals to learn a
multi-level policy by hierarchical reinforcement learning. We demonstrate our
method by building a dialogue agent for the composite task of travel planning.
Experiments with simulated and real users show that our approach performs
competitively against a state-of-the-art method that requires human-defined
subgoals. Moreover, we show that the learned subgoals are often human
comprehensible.Comment: 11 pages, 6 figures, EMNLP 201
Physicist's Journeys Through the AI World - A Topical Review. There is no royal road to unsupervised learning
Artificial Intelligence (AI), defined in its most simple form, is a
technological tool that makes machines intelligent. Since learning is at the
core of intelligence, machine learning poses itself as a core sub-field of AI.
Then there comes a subclass of machine learning, known as deep learning, to
address the limitations of their predecessors. AI has generally acquired its
prominence over the past few years due to its considerable progress in various
fields. AI has vastly invaded the realm of research. This has led physicists to
attentively direct their research towards implementing AI tools. Their central
aim has been to gain better understanding and enrich their intuition. This
review article is meant to supplement the previously presented efforts to
bridge the gap between AI and physics, and take a serious step forward to
filter out the "Babelian" clashes brought about from such gabs. This
necessitates first to have fundamental knowledge about common AI tools. To this
end, the review's primary focus shall be on deep learning models called
artificial neural networks. They are deep learning models which train
themselves through different learning processes. It discusses also the concept
of Markov decision processes. Finally, shortcut to the main goal, the review
thoroughly examines how these neural networks are capable to construct a
physical theory describing some observations without applying any previous
physical knowledge.Comment: 26 pages, 10 figures, 2 appendices, 5 algorithm
DC-NAS: Divide-and-Conquer Neural Architecture Search
Most applications demand high-performance deep neural architectures costing
limited resources. Neural architecture searching is a way of automatically
exploring optimal deep neural networks in a given huge search space. However,
all sub-networks are usually evaluated using the same criterion; that is, early
stopping on a small proportion of the training dataset, which is an inaccurate
and highly complex approach. In contrast to conventional methods, here we
present a divide-and-conquer (DC) approach to effectively and efficiently
search deep neural architectures. Given an arbitrary search space, we first
extract feature representations of all sub-networks according to changes in
parameters or output features of each layer, and then calculate the similarity
between two different sampled networks based on the representations. Then, a
k-means clustering is conducted to aggregate similar architectures into the
same cluster, separately executing sub-network evaluation in each cluster. The
best architecture in each cluster is later merged to obtain the optimal neural
architecture. Experimental results conducted on several benchmarks illustrate
that DC-NAS can overcome the inaccurate evaluation problem, achieving a
top-1 accuracy on the ImageNet dataset, which is higher than that of
state-of-the-art methods using the same search space
Evolving Culture vs Local Minima
We propose a theory that relates difficulty of learning in deep architectures
to culture and language. It is articulated around the following hypotheses: (1)
learning in an individual human brain is hampered by the presence of effective
local minima; (2) this optimization difficulty is particularly important when
it comes to learning higher-level abstractions, i.e., concepts that cover a
vast and highly-nonlinear span of sensory configurations; (3) such high-level
abstractions are best represented in brains by the composition of many levels
of representation, i.e., by deep architectures; (4) a human brain can learn
such high-level abstractions if guided by the signals produced by other humans,
which act as hints or indirect supervision for these high-level abstractions;
and (5), language and the recombination and optimization of mental concepts
provide an efficient evolutionary recombination operator, and this gives rise
to rapid search in the space of communicable ideas that help humans build up
better high-level internal representations of their world. These hypotheses put
together imply that human culture and the evolution of ideas have been crucial
to counter an optimization difficulty: this optimization difficulty would
otherwise make it very difficult for human brains to capture high-level
knowledge of the world. The theory is grounded in experimental observations of
the difficulties of training deep artificial neural networks. Plausible
consequences of this theory for the efficiency of cultural evolutions are
sketched
Coordinated Exploration in Concurrent Reinforcement Learning
We consider a team of reinforcement learning agents that concurrently learn
to operate in a common environment. We identify three properties - adaptivity,
commitment, and diversity - which are necessary for efficient coordinated
exploration and demonstrate that straightforward extensions to single-agent
optimistic and posterior sampling approaches fail to satisfy them. As an
alternative, we propose seed sampling, which extends posterior sampling in a
manner that meets these requirements. Simulation results investigate how
per-agent regret decreases as the number of agents grows, establishing
substantial advantages of seed sampling over alternative exploration schemes
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
Solving tasks with sparse rewards is one of the most important challenges in
reinforcement learning. In the single-agent setting, this challenge is
addressed by introducing intrinsic rewards that motivate agents to explore
unseen regions of their state spaces; however, applying these techniques
naively to the multi-agent setting results in agents exploring independently,
without any coordination among themselves. We argue that exploration in
cooperative multi-agent settings can be accelerated and improved if agents
coordinate with respect to the regions of the state space they explore. In this
paper we propose an approach for learning how to dynamically select between
proposed intrinsic reward types which consider not just what an individual
agent has explored, but all agents, such that the agents can coordinate their
exploration and maximize extrinsic returns. Concretely, we formulate the
approach as a hierarchical policy where a high-level controller selects among
sets of policies trained on diverse intrinsic rewards and the low-level
controllers learn the action policies of all agents under these specific
rewards. We demonstrate the effectiveness of the proposed approach in a
multi-agent GridWorld domain with sparse rewards and then show that our method
scales up to more complex settings by evaluating on the VizDoom platform
Limits of End-to-End Learning
End-to-end learning refers to training a possibly complex learning system by
applying gradient-based learning to the system as a whole. End-to-end learning
system is specifically designed so that all modules are differentiable. In
effect, not only a central learning machine, but also all "peripheral" modules
like representation learning and memory formation are covered by a holistic
learning process. The power of end-to-end learning has been demonstrated on
many tasks, like playing a whole array of Atari video games with a single
architecture. While pushing for solutions to more challenging tasks, network
architectures keep growing more and more complex.
In this paper we ask the question whether and to what extent end-to-end
learning is a future-proof technique in the sense of scaling to complex and
diverse data processing architectures. We point out potential inefficiencies,
and we argue in particular that end-to-end learning does not make optimal use
of the modular design of present neural networks. Our surprisingly simple
experiments demonstrate these inefficiencies, up to the complete breakdown of
learning
- …