309 research outputs found
Multiagent Inverse Reinforcement Learning via Theory of Mind Reasoning
We approach the problem of understanding how people interact with each other
in collaborative settings, especially when individuals know little about their
teammates, via Multiagent Inverse Reinforcement Learning (MIRL), where the goal
is to infer the reward functions guiding the behavior of each individual given
trajectories of a team's behavior during some task. Unlike current MIRL
approaches, we do not assume that team members know each other's goals a
priori; rather, that they collaborate by adapting to the goals of others
perceived by observing their behavior, all while jointly performing a task. To
address this problem, we propose a novel approach to MIRL via Theory of Mind
(MIRL-ToM). For each agent, we first use ToM reasoning to estimate a posterior
distribution over baseline reward profiles given their demonstrated behavior.
We then perform MIRL via decentralized equilibrium by employing single-agent
Maximum Entropy IRL to infer a reward function for each agent, where we
simulate the behavior of other teammates according to the time-varying
distribution over profiles. We evaluate our approach in a simulated 2-player
search-and-rescue operation where the goal of the agents, playing different
roles, is to search for and evacuate victims in the environment. Our results
show that the choice of baseline profiles is paramount to the recovery of the
ground-truth rewards, and that MIRL-ToM is able to recover the rewards used by
agents interacting both with known and unknown teammates.Comment: Accepted as a full paper at AAMAS202
Intelligent Agents for Active Malware Analysis
The main contribution of this thesis is to give a novel perspective on Active Malware Analysis modeled as a decision making process between intelligent agents. We propose solutions aimed at extracting the behaviors of malware agents with advanced Artificial Intelligence techniques. In particular, we devise novel action selection strategies for the analyzer agents that allow to analyze malware by selecting sequences of triggering actions aimed at maximizing the information acquired. The goal is to create informative models representing the behaviors of the malware agents observed while interacting with them during the analysis process. Such models can then be used to effectively compare a malware against others and to correctly identify the malware famil
Recommended from our members
Learning from action not taken in multiagent systems
Coordination in large multiagent systems in order to achieve a system level goal is a critical challenge. Given the agents' intention to cooperate, there is no guarantee that the agent actions will lead to good system objective especially when the system becomes large. One of the primary difficulties in such
mulitagent systems is the slow learning process. Agents need to learn how to interact with other agents in a complex and dynamic system while adapting in the presence of other agents that are simultaneously learning. Presented in this thesis is a unique multiagent learning approach that significantly improves both
learning speed and system level performance in multiagent systems by having an agent update its estimate of the reward (e.g., value function in reinforcement learning) for all its available actions, not just the action that was taken. This method is based on the agent receiving the reward for the actions they do not take by estimating the counterfactual reward it would have received had it taken those actions. The experimental results illustrate that the rewards on such "actions not taken" are helpful early in the learning process. The agents then use their team members to estimate these rewards resulting in principally learning as a team. Finally, it is shown that fast learning is essential in a dynamic
environment. The ANT reward with teams presents improvement in speed that results in more stability in following the changes in such an environment
Theory and applications of difference evaluation functions
ABSTRACT The credit assignment problem (which agents get credit or blame for system performance) is a key research area. For a team of agents collaborating to achieve a goal, the effectiveness of each individual agent must be calculated in order to give adequate feedback to each agent. We use the Difference Evaluation Function to find agent-specific feedback. The Difference Evaluation Function has given excellent empirical results in many domains, including air traffic control and mobile robot control. Though there has been some theoretical work that shows why Difference Evaluation Functions improve system performance, there has been no work to show when and under what conditions such improvements are realized. We apply an evolutionary game-theoretic analysis to show the theoretical advantages of the Difference Evaluation Function. We then focus on how to apply these multiagent learning methods to optimize distributed sensor networks in advanced power generation systems
Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
Each year, expert-level performance is attained in increasingly-complex
multiagent domains, notable examples including Go, Poker, and StarCraft II.
This rapid progression is accompanied by a commensurate need to better
understand how such agents attain this performance, to enable their safe
deployment, identify limitations, and reveal potential means of improving them.
In this paper we take a step back from performance-focused multiagent learning,
and instead turn our attention towards agent behavior analysis. We introduce a
model-agnostic method for discovery of behavior clusters in multiagent domains,
using variational inference to learn a hierarchy of behaviors at the joint and
local agent levels. Our framework makes no assumption about agents' underlying
learning algorithms, does not require access to their latent states or
policies, and is trained using only offline observational data. We illustrate
the effectiveness of our method for enabling the coupled understanding of
behaviors at the joint and local agent level, detection of behavior
changepoints throughout training, discovery of core behavioral concepts,
demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo
control domain, and also illustrate that the approach can disentangle
previously-trained policies in OpenAI's hide-and-seek domain
Recommended from our members
CLEAN learning to improve coordination and scalability in multiagent systems
Recent advances in multiagent learning have led to exciting new capabilities spanning fields as diverse as planetary exploration, air traffic control, military reconnaissance, and airport security. Such algorithms provide a tangible benefit over traditional control algorithms in that they allow fast responses, adapt to dynamic environments, and generally scale well. Unfortunately, because many existing multiagent learning methods are extensions of single agent approaches, they are inhibited by three key issues: i) they treat the actions of other agents as "environmental noise" in an attempt to simplify the problem complexity, ii) they are slow to converge in large systems as the joint action space grows exponentially in the number of agents, and iii) they frequently rely upon the presence of an accurate system model being readily available. This work addresses these three issues sequentially. First, we improve overall learning performance compared to existing state-of-the-art techniques in the field by embracing the exploration in learning rather than ignoring it or approximating it away. Within multiagent systems, exploration by individual agents significantly alters the dynamics of the environment in which all agents learn. To address this, we introduce the concept of "private" exploration, which enables each agent to present a stationary baseline policy to other agents in order to allow other agents in the system to learn more efficiently. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards which improve coordination and performance by utilizing the concept of private exploration in order to remove the negative impact of traditional "public" exploration strategies from learning in multiagent systems. Next, we leverage the fundamental properties of CLEAN rewards that enable private exploration to allow agents to explore multiple potential actions concurrently in a "batch mode" in order to significantly improve learning speed over the state-of-the-art. Finally, we improve the real-world applicability of the proposed techniques by reducing their requirements. Specifically, the CLEAN rewards developed require an accurate partial model (i.e., an accurate model of the system objective) of the system in order to be computed. Unfortunately, many real-world systems are too complex to be modeled or are not known in advance, so an accurate system model is not available a priori. We address this shortcoming by employing model-based reinforcement learning techniques to enable agents to construct their own approximate model of the system objective based upon their observations and use this approximate model to calculate their CLEAN rewards.Keywords: Multiagent Coordination, Multiagent Learning, UAV Communication Network, Fractionated Satellites, UAV Swarms, Distributed Control, Multiagent Scalability, Learning based control, Reward Shaping, Cubesats, Multiagent systems, Solar Power UAVs, Satellite Constellation
Recommended from our members
Modeling multidisciplinary design with multiagent learning
Complex engineered systems design is a collaborative activity. To design a system, experts from the relevant disciplines must work together to create the best overall system from their individual components. This situation is analogous to a multiagent system in which agents solve individual parts of a larger problem in a coordinated way. Current multiagent models of design teams, however, do not capture this distributed aspect of design teams - instead either representing designers as agents which control all variables, measuring organizational outcomes instead of design outcomes, or representing different aspects of distributed design, such as negotiation. This paper presents a new model which captures the distributed nature of complex systems design by decomposing the ability to control design variables to individual computational designers acting on a problem with shared constraints. These designers are represented as a multiagent learning system which is shown to perform similarly to a centralized optimization algorithm on the same domain. When used as a model, this multiagent system is shown to perform better when the level of designer exploration is not decayed but is instead controlled based on the increase of design knowledge, suggesting that designers in multidisciplinary teams should not simply reduce the scope of design exploration over time, but should adapt based on changes in their collective knowledge of the design space. This multiagent system is further shown to produce better-performing designs when computational designers design collaboratively as opposed to independently, confirming the importance of collaboration in complex systems design
- …