2,363 research outputs found
Automated Game Design Learning
While general game playing is an active field of research, the learning of
game design has tended to be either a secondary goal of such research or it has
been solely the domain of humans. We propose a field of research, Automated
Game Design Learning (AGDL), with the direct purpose of learning game designs
directly through interaction with games in the mode that most people experience
games: via play. We detail existing work that touches the edges of this field,
describe current successful projects in AGDL and the theoretical foundations
that enable them, point to promising applications enabled by AGDL, and discuss
next steps for this exciting area of study. The key moves of AGDL are to use
game programs as the ultimate source of truth about their own design, and to
make these design properties available to other systems and avenues of inquiry.Comment: 8 pages, 2 figures. Accepted for CIG 201
CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments
In this paper we study a new reinforcement learning setting where the
environment is non-rewarding, contains several possibly related objects of
various controllability, and where an apt agent Bob acts independently, with
non-observable intentions. We argue that this setting defines a realistic
scenario and we present a generic discrete-state discrete-action model of such
environments. To learn in this environment, we propose an unsupervised
reinforcement learning agent called CLIC for Curriculum Learning and Imitation
for Control. CLIC learns to control individual objects in its environment, and
imitates Bob's interactions with these objects. It selects objects to focus on
when training and imitating by maximizing its learning progress. We show that
CLIC is an effective baseline in our new setting. It can effectively observe
Bob to gain control of objects faster, even if Bob is not explicitly teaching.
It can also follow Bob when he acts as a mentor and provides ordered
demonstrations. Finally, when Bob controls objects that the agent cannot, or in
presence of a hierarchy between objects in the environment, we show that CLIC
ignores non-reproducible and already mastered interactions with objects,
resulting in a greater benefit from imitation
Preference-Based Monte Carlo Tree Search
Monte Carlo tree search (MCTS) is a popular choice for solving sequential
anytime problems. However, it depends on a numeric feedback signal, which can
be difficult to define. Real-time MCTS is a variant which may only rarely
encounter states with an explicit, extrinsic reward. To deal with such cases,
the experimenter has to supply an additional numeric feedback signal in the
form of a heuristic, which intrinsically guides the agent. Recent work has
shown evidence that in different areas the underlying structure is ordinal and
not numerical. Hence erroneous and biased heuristics are inevitable, especially
in such domains. In this paper, we propose a MCTS variant which only depends on
qualitative feedback, and therefore opens up new applications for MCTS. We also
find indications that translating absolute into ordinal feedback may be
beneficial. Using a puzzle domain, we show that our preference-based MCTS
variant, wich only receives qualitative feedback, is able to reach a
performance level comparable to a regular MCTS baseline, which obtains
quantitative feedback.Comment: To be publishe
On the Evolutionary Emergence of Optimism
Successful individuals were frequently found to be overly optimistic. This is puzzling because it might be thought that optimistic individuals who consistently overestimate their eventual payoffs will not do as well as realists who see the situation as it truly is and hence will not survive evolutionary pressures. We show that contrary to this intuition, there is a large class of either competitive or cooperative strategic interactions between randomly matched pairs of individuals in the population, in which "cautiously" optimistic individuals not only survive but also prosper and take over the entire population. The reason for this result is that optimistic individuals who overestimate the impact of their actions on their payoffs, behave more aggressively than realists and pessimists. When the interactions between individuals involve negative externalities (the payoff of one player decreases with the actions taken by another player) and the actions are strategic substitutes, being aggressive induces the opponent to be softer, so optimists gain a strategic advantage that, for moderate levels of optimism, outweighs the loss from having the wrong perception of the environment. Likewise, when the interactions between individuals involve positive externalities and the actions are strategic complements, being aggressive triggers a favorable aggressive behavior from the opponent. Hence, in both cases, cautiously optimistic types fare better on average than other types of individuals. We show that if the initial distribution of types is sufficiently wide, then over time it will converge in distribution to a mass point on some level of cautious optimism.
Synthetic steganography: Methods for generating and detecting covert channels in generated media
Issues of privacy in communication are becoming increasingly important. For many people and businesses, the use of strong cryptographic protocols is sufficient to protect their communications. However, the overt use of strong cryptography may be prohibited or individual entities may be prohibited from communicating directly. In these cases, a secure alternative to the overt use of strong cryptography is required. One promising alternative is to hide the use of cryptography by transforming ciphertext into innocuous-seeming messages to be transmitted in the clear. ^ In this dissertation, we consider the problem of synthetic steganography: generating and detecting covert channels in generated media. We start by demonstrating how to generate synthetic time series data that not only mimic an authentic source of the data, but also hide data at any of several different locations in the reversible generation process. We then design a steganographic context-sensitive tiling system capable of hiding secret data in a variety of procedurally-generated multimedia objects. Next, we show how to securely hide data in the structure of a Huffman tree without affecting the length of the codes. Next, we present a method for hiding data in Sudoku puzzles, both in the solved board and the clue configuration. Finally, we present a general framework for exploiting steganographic capacity in structured interactions like online multiplayer games, network protocols, auctions, and negotiations. Recognizing that structured interactions represent a vast field of novel media for steganography, we also design and implement an open-source extensible software testbed for analyzing steganographic interactions and use it to measure the steganographic capacity of several classic games. ^ We analyze the steganographic capacity and security of each method that we present and show that existing steganalysis techniques cannot accurately detect the usage of the covert channels. We develop targeted steganalysis techniques which improve detection accuracy and then use the insights gained from those methods to improve the security of the steganographic systems. We find that secure synthetic steganography, and accurate steganalysis thereof, depends on having access to an accurate model of the cover media
- …