179 research outputs found
Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence
Learning agents that are not only capable of taking tests, but also
innovating is becoming a hot topic in AI. One of the most promising paths
towards this vision is multi-agent learning, where agents act as the
environment for each other, and improving each agent means proposing new
problems for others. However, existing evaluation platforms are either not
compatible with multi-agent settings, or limited to a specific game. That is,
there is not yet a general evaluation platform for research on multi-agent
intelligence. To this end, we introduce Arena, a general evaluation platform
for multi-agent intelligence with 35 games of diverse logics and
representations. Furthermore, multi-agent intelligence is still at the stage
where many problems remain unexplored. Therefore, we provide a building toolkit
for researchers to easily invent and build novel multi-agent problems from the
provided game set based on a GUI-configurable social tree and five basic
multi-agent reward schemes. Finally, we provide Python implementations of five
state-of-the-art deep multi-agent reinforcement learning baselines. Along with
the baseline implementations, we release a set of 100 best agents/teams that we
can train with different training schemes for each game, as the base for
evaluating agents with population performance. As such, the research community
can perform comparisons under a stable and uniform standard. All the
implementations and accompanied tutorials have been open-sourced for the
community at https://sites.google.com/view/arena-unity/
Integrating Human-Provided Information Into Belief State Representation Using Dynamic Factorization
In partially observed environments, it can be useful for a human to provide
the robot with declarative information that represents probabilistic relational
constraints on properties of objects in the world, augmenting the robot's
sensory observations. For instance, a robot tasked with a search-and-rescue
mission may be informed by the human that two victims are probably in the same
room. An important question arises: how should we represent the robot's
internal knowledge so that this information is correctly processed and combined
with raw sensory information? In this paper, we provide an efficient belief
state representation that dynamically selects an appropriate factoring,
combining aspects of the belief when they are correlated through information
and separating them when they are not. This strategy works in open domains, in
which the set of possible objects is not known in advance, and provides
significant improvements in inference time over a static factoring, leading to
more efficient planning for complex partially observed tasks. We validate our
approach experimentally in two open-domain planning problems: a 2D discrete
gridworld task and a 3D continuous cooking task. A supplementary video can be
found at http://tinyurl.com/chitnis-iros-18.Comment: IROS 2018 final versio
Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches
Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment
- …