878 research outputs found

    ViZDoom Competitions: Playing Doom from Pixels

    Full text link
    This paper presents the first two editions of Visual Doom AI Competition, held in 2016 and 2017. The challenge was to create bots that compete in a multi-player deathmatch in a first-person shooter (FPS) game, Doom. The bots had to make their decisions based solely on visual information, i.e., a raw screen buffer. To play well, the bots needed to understand their surroundings, navigate, explore, and handle the opponents at the same time. These aspects, together with the competitive multi-agent aspect of the game, make the competition a unique platform for evaluating the state of the art reinforcement learning algorithms. The paper discusses the rules, solutions, results, and statistics that give insight into the agents' behaviors. Best-performing agents are described in more detail. The results of the competition lead to the conclusion that, although reinforcement learning can produce capable Doom bots, they still are not yet able to successfully compete against humans in this game. The paper also revisits the ViZDoom environment, which is a flexible, easy to use, and efficient 3D platform for research for vision-based reinforcement learning, based on a well-recognized first-person perspective game Doom

    The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

    Full text link
    We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers. Our model builds an object-based scene representation and translates sentences into executable, symbolic programs. To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation. Analogical to human concept learning, the perception module learns visual concepts based on the language description of the object being referred to. Meanwhile, the learned visual concepts facilitate learning new words and parsing new sentences. We use curriculum learning to guide the searching over the large compositional space of images and language. Extensive experiments demonstrate the accuracy and efficiency of our model on learning visual concepts, word representations, and semantic parsing of sentences. Further, our method allows easy generalization to new object attributes, compositions, language concepts, scenes and questions, and even new program domains. It also empowers applications including visual question answering and bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu

    Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

    Full text link
    Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into sub-tasks. A step in the Q-function can be associated with solving a sub-task, where the expectation of the return increases. RUDDER has been introduced to identify these steps and then redistribute reward to them, thus immediately giving reward if sub-tasks are solved. Since the problem of delayed rewards is mitigated, learning is considerably sped up. However, for complex tasks, current exploration strategies as deployed in RUDDER struggle with discovering episodes with high rewards. Therefore, we assume that episodes with high rewards are given as demonstrations and do not have to be discovered by exploration. Typically the number of demonstrations is small and RUDDER's LSTM model as a deep learning method does not learn well. Hence, we introduce Align-RUDDER, which is RUDDER with two major modifications. First, Align-RUDDER assumes that episodes with high rewards are given as demonstrations, replacing RUDDER's safe exploration and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile model that is obtained from multiple sequence alignment of demonstrations. Profile models can be constructed from as few as two demonstrations as known from bioinformatics. Align-RUDDER inherits the concept of reward redistribution, which considerably reduces the delay of rewards, thus speeding up learning. Align-RUDDER outperforms competitors on complex artificial tasks with delayed reward and few demonstrations. On the MineCraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Github: https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-U

    深層強化学習における多次元混合行動空間のための行動分岐アーキテクチャの改善

    Get PDF
    学位の種別: 修士University of Tokyo(東京大学

    Designing a User Interface for Musical Gameplay

    Get PDF
    The graphical user interface, or GUI, is the method a game uses to communicate with the player and has a large impact on the gameplay experience. The goal of this project was to design a GUI for a music oriented game that allows players to construct a custom instrument using instruments they have acquired throughout the game. Based on our research of GUIs, we designed a prototype in Unity that incorporates a grid system that responds to keypress and mouse click events. We then performed a playtest and conducted a survey with students to acquire feedback about the simplicity and effectiveness of our design. We found that our design had some confusing elements, but was overall intuitive and easy to use

    R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections

    Full text link
    The influence of Deep Learning on image identification and natural language processing has attracted enormous attention globally. The convolution neural network that can learn without prior extraction of features fits well in response to the rapid iteration of Android malware. The traditional solution for detecting Android malware requires continuous learning through pre-extracted features to maintain high performance of identifying the malware. In order to reduce the manpower of feature engineering prior to the condition of not to extract pre-selected features, we have developed a coloR-inspired convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2) system. The system can convert the bytecode of classes.dex from Android archive file to rgb color code and store it as a color image with fixed size. The color image is input to the convolutional neural network for automatic feature extraction and training. The data was collected from Jan. 2017 to Aug 2017. During the period of time, we have collected approximately 2 million of benign and malicious Android apps for our experiments with the help from our research partner Leopard Mobile Inc. Our experiment results demonstrate that the proposed system has accurate security analysis on contracts. Furthermore, we keep our research results and experiment materials on http://R2D2.TWMAN.ORG.Comment: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13, 2018. (Accepted
    corecore