40,470 research outputs found

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Full text link
    We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain

    Hierarchical object detection with deep reinforcement learning

    Get PDF
    We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis. We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.Postprint (published version

    Deep Object-Centric Representations for Generalizable Robot Learning

    Full text link
    Robotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as an object-centric prior for the perception system of a learned policy. We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy. A task-independent meta-attention locates possible objects in the scene, and a task-specific attention identifies which objects are predictive of the trajectories. The scope of the task-specific attention is easily adjusted by showing demonstrations with distractor objects or with diverse relevant objects. Our results indicate that this approach exhibits good generalization across object instances using very few samples, and can be used to learn a variety of manipulation tasks using reinforcement learning

    Semantic Segmentation in 2D Videogames

    Full text link
    This Master Thesis focuses on applying semantic segmentation, a computer vision technique, with the objective of improving the performance of deep-learning reinforcement models, and in particular, the performance over the original Super Mario Bros videogame. While humans can play a stage from a videogame like Super Mario Bros, and quickly identify from the elements in the screen what object is the character they are playing with, what are enemies and what elements are obstacles, this is not the case for neural networks, as they require a certain training to understand what is displayed in the screen. Using semantic segmentation, we can heavily simplify the frames from the videogame, and reduce visual information of elements in the screen to class and location, which is the most relevant information required to complete the game. In this work, a synthetic dataset generator that simulates frames from the Super Mario Bros videogame has been developed. This dataset has been used to train semantic segmentation deep-learning models which have been incorporated to a deep reinforcement learning algorithm with the objective of improving the performance of it. We have found that applying semantic segmentation as a frame processing method can actually help reinforcement learning models to train more efficiently and with better generalization. These results also suggest that there could be other computer vision techniques, like object detection or tracking, that could be found useful to help with the training of reinforcement learning algorithms, and they could be an interesting topic for future research
    corecore