878 research outputs found
ViZDoom Competitions: Playing Doom from Pixels
This paper presents the first two editions of Visual Doom AI Competition,
held in 2016 and 2017. The challenge was to create bots that compete in a
multi-player deathmatch in a first-person shooter (FPS) game, Doom. The bots
had to make their decisions based solely on visual information, i.e., a raw
screen buffer. To play well, the bots needed to understand their surroundings,
navigate, explore, and handle the opponents at the same time. These aspects,
together with the competitive multi-agent aspect of the game, make the
competition a unique platform for evaluating the state of the art reinforcement
learning algorithms. The paper discusses the rules, solutions, results, and
statistics that give insight into the agents' behaviors. Best-performing agents
are described in more detail. The results of the competition lead to the
conclusion that, although reinforcement learning can produce capable Doom bots,
they still are not yet able to successfully compete against humans in this
game. The paper also revisits the ViZDoom environment, which is a flexible,
easy to use, and efficient 3D platform for research for vision-based
reinforcement learning, based on a well-recognized first-person perspective
game Doom
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Reinforcement Learning algorithms require a large number of samples to solve
complex tasks with sparse and delayed rewards. Complex tasks can often be
hierarchically decomposed into sub-tasks. A step in the Q-function can be
associated with solving a sub-task, where the expectation of the return
increases. RUDDER has been introduced to identify these steps and then
redistribute reward to them, thus immediately giving reward if sub-tasks are
solved. Since the problem of delayed rewards is mitigated, learning is
considerably sped up. However, for complex tasks, current exploration
strategies as deployed in RUDDER struggle with discovering episodes with high
rewards. Therefore, we assume that episodes with high rewards are given as
demonstrations and do not have to be discovered by exploration. Typically the
number of demonstrations is small and RUDDER's LSTM model as a deep learning
method does not learn well. Hence, we introduce Align-RUDDER, which is RUDDER
with two major modifications. First, Align-RUDDER assumes that episodes with
high rewards are given as demonstrations, replacing RUDDER's safe exploration
and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile
model that is obtained from multiple sequence alignment of demonstrations.
Profile models can be constructed from as few as two demonstrations as known
from bioinformatics. Align-RUDDER inherits the concept of reward
redistribution, which considerably reduces the delay of rewards, thus speeding
up learning. Align-RUDDER outperforms competitors on complex artificial tasks
with delayed reward and few demonstrations. On the MineCraft ObtainDiamond
task, Align-RUDDER is able to mine a diamond, though not frequently. Github:
https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-U
深層強化学習における多次元混合行動空間のための行動分岐アーキテクチャの改善
学位の種別: 修士University of Tokyo(東京大学
Designing a User Interface for Musical Gameplay
The graphical user interface, or GUI, is the method a game uses to communicate with the player and has a large impact on the gameplay experience. The goal of this project was to design a GUI for a music oriented game that allows players to construct a custom instrument using instruments they have acquired throughout the game. Based on our research of GUIs, we designed a prototype in Unity that incorporates a grid system that responds to keypress and mouse click events. We then performed a playtest and conducted a survey with students to acquire feedback about the simplicity and effectiveness of our design. We found that our design had some confusing elements, but was overall intuitive and easy to use
R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections
The influence of Deep Learning on image identification and natural language
processing has attracted enormous attention globally. The convolution neural
network that can learn without prior extraction of features fits well in
response to the rapid iteration of Android malware. The traditional solution
for detecting Android malware requires continuous learning through
pre-extracted features to maintain high performance of identifying the malware.
In order to reduce the manpower of feature engineering prior to the condition
of not to extract pre-selected features, we have developed a coloR-inspired
convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2)
system. The system can convert the bytecode of classes.dex from Android archive
file to rgb color code and store it as a color image with fixed size. The color
image is input to the convolutional neural network for automatic feature
extraction and training. The data was collected from Jan. 2017 to Aug 2017.
During the period of time, we have collected approximately 2 million of benign
and malicious Android apps for our experiments with the help from our research
partner Leopard Mobile Inc. Our experiment results demonstrate that the
proposed system has accurate security analysis on contracts. Furthermore, we
keep our research results and experiment materials on http://R2D2.TWMAN.ORG.Comment: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13,
2018. (Accepted
- …