5,425 research outputs found

    Outline Objects using Deep Reinforcement Learning

    Full text link
    Image segmentation needs both local boundary position information and global object context information. The performance of the recent state-of-the-art method, fully convolutional networks, reaches a bottleneck due to the neural network limit after balancing between the two types of information simultaneously in an end-to-end training style. To overcome this problem, we divide the semantic image segmentation into temporal subtasks. First, we find a possible pixel position of some object boundary; then trace the boundary at steps within a limited length until the whole object is outlined. We present the first deep reinforcement learning approach to semantic image segmentation, called DeepOutline, which outperforms other algorithms in Coco detection leaderboard in the middle and large size person category in Coco val2017 dataset. Meanwhile, it provides an insight into a divide and conquer way by reinforcement learning on computer vision problems

    Hacking Google reCAPTCHA v3 using Reinforcement Learning

    Full text link
    We present a Reinforcement Learning (RL) methodology to bypass Google reCAPTCHA v3. We formulate the problem as a grid world where the agent learns how to move the mouse and click on the reCAPTCHA button to receive a high score. We study the performance of the agent when we vary the cell size of the grid world and show that the performance drops when the agent takes big steps toward the goal. Finally, we used a divide and conquer strategy to defeat the reCAPTCHA system for any grid resolution. Our proposed method achieves a success rate of 97.4% on a 100x100 grid and 96.7% on a 1000x1000 screen resolution.Comment: Accepted for the Conference on Reinforcement Learning and Decision Making (RLDM) 201

    Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks

    Full text link
    Recent developments in deep reinforcement learning have enabled the creation of agents for solving a large variety of games given a visual input. These methods have been proven successful for 2D games, like the Atari games, or for simple tasks, like navigating in mazes. It is still an open question, how to address more complex environments, in which the reward is sparse and the state space is huge. In this paper we propose a divide and conquer deep reinforcement learning solution and we test our agent in the first person shooter (FPS) game of Doom. Our work is based on previous works in deep reinforcement learning and in Doom agents. We also present how our agent is able to perform better in unknown environments compared to a state of the art reinforcement learning algorithm.Comment: 4 pages, 3 figures, 3 table

    Subgoal Discovery for Hierarchical Dialogue Policy Learning

    Full text link
    Developing agents to engage in complex goal-oriented dialogues is challenging partly because the main learning signals are very sparse in long conversations. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given successful example dialogues, we propose the Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a multi-level policy by hierarchical reinforcement learning. We demonstrate our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that our approach performs competitively against a state-of-the-art method that requires human-defined subgoals. Moreover, we show that the learned subgoals are often human comprehensible.Comment: 11 pages, 6 figures, EMNLP 201

    Physicist's Journeys Through the AI World - A Topical Review. There is no royal road to unsupervised learning

    Full text link
    Artificial Intelligence (AI), defined in its most simple form, is a technological tool that makes machines intelligent. Since learning is at the core of intelligence, machine learning poses itself as a core sub-field of AI. Then there comes a subclass of machine learning, known as deep learning, to address the limitations of their predecessors. AI has generally acquired its prominence over the past few years due to its considerable progress in various fields. AI has vastly invaded the realm of research. This has led physicists to attentively direct their research towards implementing AI tools. Their central aim has been to gain better understanding and enrich their intuition. This review article is meant to supplement the previously presented efforts to bridge the gap between AI and physics, and take a serious step forward to filter out the "Babelian" clashes brought about from such gabs. This necessitates first to have fundamental knowledge about common AI tools. To this end, the review's primary focus shall be on deep learning models called artificial neural networks. They are deep learning models which train themselves through different learning processes. It discusses also the concept of Markov decision processes. Finally, shortcut to the main goal, the review thoroughly examines how these neural networks are capable to construct a physical theory describing some observations without applying any previous physical knowledge.Comment: 26 pages, 10 figures, 2 appendices, 5 algorithm

    DC-NAS: Divide-and-Conquer Neural Architecture Search

    Full text link
    Most applications demand high-performance deep neural architectures costing limited resources. Neural architecture searching is a way of automatically exploring optimal deep neural networks in a given huge search space. However, all sub-networks are usually evaluated using the same criterion; that is, early stopping on a small proportion of the training dataset, which is an inaccurate and highly complex approach. In contrast to conventional methods, here we present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures. Given an arbitrary search space, we first extract feature representations of all sub-networks according to changes in parameters or output features of each layer, and then calculate the similarity between two different sampled networks based on the representations. Then, a k-means clustering is conducted to aggregate similar architectures into the same cluster, separately executing sub-network evaluation in each cluster. The best architecture in each cluster is later merged to obtain the optimal neural architecture. Experimental results conducted on several benchmarks illustrate that DC-NAS can overcome the inaccurate evaluation problem, achieving a 75.1%75.1\% top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space

    Evolving Culture vs Local Minima

    Full text link
    We propose a theory that relates difficulty of learning in deep architectures to culture and language. It is articulated around the following hypotheses: (1) learning in an individual human brain is hampered by the presence of effective local minima; (2) this optimization difficulty is particularly important when it comes to learning higher-level abstractions, i.e., concepts that cover a vast and highly-nonlinear span of sensory configurations; (3) such high-level abstractions are best represented in brains by the composition of many levels of representation, i.e., by deep architectures; (4) a human brain can learn such high-level abstractions if guided by the signals produced by other humans, which act as hints or indirect supervision for these high-level abstractions; and (5), language and the recombination and optimization of mental concepts provide an efficient evolutionary recombination operator, and this gives rise to rapid search in the space of communicable ideas that help humans build up better high-level internal representations of their world. These hypotheses put together imply that human culture and the evolution of ideas have been crucial to counter an optimization difficulty: this optimization difficulty would otherwise make it very difficult for human brains to capture high-level knowledge of the world. The theory is grounded in experimental observations of the difficulties of training deep artificial neural networks. Plausible consequences of this theory for the efficiency of cultural evolutions are sketched

    Coordinated Exploration in Concurrent Reinforcement Learning

    Full text link
    We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes

    Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning

    Full text link
    Solving tasks with sparse rewards is one of the most important challenges in reinforcement learning. In the single-agent setting, this challenge is addressed by introducing intrinsic rewards that motivate agents to explore unseen regions of their state spaces; however, applying these techniques naively to the multi-agent setting results in agents exploring independently, without any coordination among themselves. We argue that exploration in cooperative multi-agent settings can be accelerated and improved if agents coordinate with respect to the regions of the state space they explore. In this paper we propose an approach for learning how to dynamically select between proposed intrinsic reward types which consider not just what an individual agent has explored, but all agents, such that the agents can coordinate their exploration and maximize extrinsic returns. Concretely, we formulate the approach as a hierarchical policy where a high-level controller selects among sets of policies trained on diverse intrinsic rewards and the low-level controllers learn the action policies of all agents under these specific rewards. We demonstrate the effectiveness of the proposed approach in a multi-agent GridWorld domain with sparse rewards and then show that our method scales up to more complex settings by evaluating on the VizDoom platform

    Limits of End-to-End Learning

    Full text link
    End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all "peripheral" modules like representation learning and memory formation are covered by a holistic learning process. The power of end-to-end learning has been demonstrated on many tasks, like playing a whole array of Atari video games with a single architecture. While pushing for solutions to more challenging tasks, network architectures keep growing more and more complex. In this paper we ask the question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures. We point out potential inefficiencies, and we argue in particular that end-to-end learning does not make optimal use of the modular design of present neural networks. Our surprisingly simple experiments demonstrate these inefficiencies, up to the complete breakdown of learning
    • …
    corecore