Search CORE

32 research outputs found

Recommended from our members

Opponent modeling and exploitation in poker using evolved recurrent neural networks

Author: Li Xun, Ph. D. in computer sciences
Publication venue
Publication date: 03/10/2018
Field of study

As a classic example of imperfect information games, poker, in particular, Heads-Up No-Limit Texas Holdem (HUNL), has been studied extensively in recent years. A number of computer poker agents have been built with increasingly higher quality. While agents based on approximated Nash equilibrium have been successful, they lack the ability to exploit their opponents effectively. In addition, the performance of equilibrium strategies cannot be guaranteed in games with more than two players and multiple Nash equilibria. This dissertation focuses on devising an evolutionary method to discover opponent models based on recurrent neural networks. A series of computer poker agents called Adaptive System for Hold’Em (ASHE) were evolved for HUNL. ASHE models the opponent explicitly using Pattern Recognition Trees (PRTs) and LSTM estimators. The default and board-texture-based PRTs maintain statistical data on the opponent strategies at different game states. The Opponent Action Rate Estimator predicts the opponent’s moves, and the Hand Range Estimator evaluates the showdown value of ASHE’s hand. Recursive Utility Estimation is used to evaluate the expected utility/reward for each available action. Experimental results show that (1) ASHE exploits opponents with high to moderate level of exploitability more effectively than Nash-equilibrium-based agents, and (2) ASHE can defeat top-ranking equilibrium-based poker agents. Thus, the dissertation introduces an effective new method to building high-performance computer agents for poker and other imperfect information games. It also provides a promising direction for future research in imperfect information games beyond the equilibrium-based approach.Computer Science

Texas ScholarWorks

Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches

Author: Keith Andrew J.
Publication venue: AFIT Scholar
Publication date: 22/08/2019
Field of study

Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment

AFTI Scholar (Air Force Institute of Technology)

Probabilistic movement modeling for intention inference in human-robot interaction.

Author: Anderson R
Billingsley J
Bishop C
Deisenroth M
Deisenroth M
Friesen A
Fässler H
Khan M
Lawrence N
Lawrence N
Quiñonero-Candela J
Quiñonero-Candela J
Ramanantsoa M
Rao R
Rasmussen C
Schölkopf B
Simon M
Turner R
van der Maaten L
Wang Z
Williams A
Ziebart B
Publication venue: 'SAGE Publications'
Publication date: 01/01/2013
Field of study

Intention inference can be an essential step toward efficient humanrobot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows to infer the intention from observed movements using Bayes ’ theorem. The IDDM simultaneously finds a latent state representation of noisy and highdimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.

CiteSeerX

TUbiblio

Crossref

Publikationsserver der Universität Tübingen

Spiral - Imperial College Digital Repository

MPG.PuRe

Cyber Deception for Critical Infrastructure Resiliency

Author: Al Amin Md Ali Reza
Publication venue: ODU Digital Commons
Publication date: 01/08/2022
Field of study

The high connectivity of modern cyber networks and devices has brought many improvements to the functionality and efficiency of networked systems. Unfortunately, these benefits have come with many new entry points for attackers, making systems much more vulnerable to intrusions. Thus, it is critically important to protect cyber infrastructure against cyber attacks. The static nature of cyber infrastructure leads to adversaries performing reconnaissance activities and identifying potential threats. Threats related to software vulnerabilities can be mitigated upon discovering a vulnerability and-, developing and releasing a patch to remove the vulnerability. Unfortunately, the period between discovering a vulnerability and applying a patch is long, often lasting five months or more. These delays pose significant risks to the organization while many cyber networks are operational. This concern necessitates the development of an active defense system capable of thwarting cyber reconnaissance missions and mitigating the progression of the attacker through the network. Thus, my research investigates how to develop an efficient defense system to address these challenges. First, we proposed the framework to show how the defender can use the network of decoys along with the real network to introduce mistrust. However, another research problem, the defender’s choice of whether to save resources or spend more (number of decoys) resources in a resource-constrained system, needs to be addressed. We developed a Dynamic Deception System (DDS) that can assess various attacker types based on the attacker’s knowledge, aggression, and stealthiness level to decide whether the defender should spend or save resources. In our DDS, we leveraged Software Defined Networking (SDN) to differentiate the malicious traffic from the benign traffic to deter the cyber reconnaissance mission and redirect malicious traffic to the deception server. Experiments conducted on the prototype implementation of our DDS confirmed that the defender could decide whether to spend or save resources based on the attacker types and thwarted cyber reconnaissance mission. Next, we addressed the challenge of efficiently placing network decoys by predicting the most likely attack path in Multi-Stage Attacks (MSAs). MSAs are cyber security threats where the attack campaign is performed through several attack stages and adversarial lateral movement is one of the critical stages. Adversaries can laterally move into the network without raising an alert. To prevent lateral movement, we proposed an approach that combines reactive (graph analysis) and proactive (cyber deception technology) defense. The proposed approach is realized through two phases. The first phase predicts the most likely attack path based on Intrusion Detection System (IDS) alerts and network trace. The second phase determines the optimal deployment of decoy nodes along the predicted path. We employ transition probabilities in a Hidden Markov Model to predict the path. In the second phase, we utilize the predicted attack path to deploy decoy nodes. The evaluation results show that our approach can predict the most likely attack paths and thwart adversarial lateral movement

Old Dominion University

Intention Inference and Decision Making with Hierarchical Gaussian Process Dynamics Models

Author: Darmstadt D
Entscheidungsfindung Hierarchischen Gaußprozess
Inferenz Intentionen
Publication venue
Publication date: 01/01/2013
Field of study

Anticipation is crucial for fluent human-robot interaction, which allows a robot to independently coordinate its actions with human beings in joint activities. An anticipatory robot relies on a predictive model of its human partners, and selects its own action according to the model's predictions. Intention inference and decision making are key elements towards such anticipatory robots. In this thesis, we present a machine-learning approach to intention inference and decision making, based on Hierarchical Gaussian Process Dynamics Models (H-GPDMs). We first introduce the H-GPDM, a class of generic latent-variable dynamics models. The H-GPDM represents the generative process of complex human movements that are directed by exogenous driving factors. Incorporating the exogenous variables in the dynamics model, the H-GPDM achieves improved interpretation, analysis, and prediction of human movements. While exact inference of the exogenous variables and the latent states is intractable, we introduce an approximate method using variational Bayesian inference, and demonstrate the merits of the H-GPDM in three different applications of human movement analysis. The H-GPDM lays a foundation for the following studies on intention inference and decision making. Intention inference is an essential step towards anticipatory robots. For this purpose, we consider a special case of the H-GPDM, the Intention-Driven Dynamics Model (IDDM), which considers the human partners' intention as exogenous driving factors. The IDDM is applicable to intention inference from observed movements using Bayes' theorem, where the latent state variables are marginalized out. As most robotics applications are subject to real-time constraints, we introduce an efficient online algorithm that allows for real-time intention inference. We show that the IDDM achieved state-of-the-art performance in intention inference using two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive robots. Decision making based on a time series of predictions allows a robot to be proactive in its action selection, which involves a trade-off between the accuracy and confidence of the prediction and the time for executing a selected action. To address the problem of action selection and optimal timing for initiating the movement, we formulate the anticipatory action selection using Partially Observable Markov Decision Process, where the H-GPDM is adopted to update belief state and to estimate transition model. We present two approaches to policy learning and decision making, and show their effectiveness using human-robot table tennis. In addition, we consider decision making solely based on the preference of the human partners, where observations are not sufficient for reliable intention inference. We formulate it as a repeated game and present a learning approach to safe strategies that exploit the humans' preferences. The learned strategy enables action selection when reliable intention inference is not available due to insufficient observation, e.g., for a robot to return served balls from a human table tennis player. In this thesis, we use human-robot table tennis as a running example, where a key bottleneck is the limited amount of time for executing a hitting movement. Movement initiation usually requires an early decision on the type of action, such as a forehand or backhand hitting movement, at least 80ms before the opponent has hit the ball. The robot, therefore, needs to be anticipatory and proactive of the opponent's intended target. Using the proposed methods, the robot can predict the intended target of the opponent and initiate an appropriate hitting movement according to the prediction. Experimental results show that the proposed intention inference and decision making methods can substantially enhance the capability of the robot table tennis player, using both a physically realistic simulation and a real Barrett WAM robot arm with seven degrees of freedom

Many-agent Reinforcement Learning

Author: Yang Yaodong
Publication venue: UCL (University College London)
Publication date: 28/03/2021
Field of study

Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (

N \gg 2

), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm --

\alpha^\alpha

-Rank -- in many-agent systems. The critical advantage of

\alpha^\alpha

-Rank is that it can compute the solution concept of

\alpha

-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be

PPAD

-hard in even two-player cases.

\alpha^\alpha

-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

UCL Discovery

Modeling Mutual Influence in Multi-Agent Reinforcement Learning

Author: Wen Ying
Publication venue: UCL (University College London)
Publication date: 28/08/2020
Field of study

In multi-agent systems (MAS), agents rarely act in isolation but tend to achieve their goals through interactions with other agents. To be able to achieve their ultimate goals, individual agents should actively evaluate the impacts on themselves of other agents' behaviors before they decide which actions to take. The impacts are reciprocal, and it is of great interest to model the mutual influence of agent's impacts with one another when they are observing the environment or taking actions in the environment. In this thesis, assuming that the agents are aware of each other's existence and their potential impact on themselves, I develop novel multi-agent reinforcement learning (MARL) methods that can measure the mutual influence between agents to shape learning. The first part of this thesis outlines the framework of recursive reasoning in deep multi-agent reinforcement learning. I hypothesize that it is beneficial for each agent to consider how other agents react to their behavior. I start from Probabilistic Recursive Reasoning (PR2) using level-1 reasoning and adopt variational Bayes methods to approximate the opponents' conditional policies. Each agent shapes the individual Q-value by marginalizing the conditional policies in the joint Q-value and finding the best response to improving their policies. I further extend PR2 to Generalized Recursive Reasoning (GR2) with different hierarchical levels of rationality. GR2 enables agents to possess various levels of thinking ability, thereby allowing higher-level agents to best respond to less sophisticated learners. The first part of the thesis shows that eliminating the joint Q-value to an individual Q-value via explicitly recursive reasoning would benefit the learning. In the second part of the thesis, in reverse, I measure the mutual influence by approximating the joint Q-value based on the individual Q-values. I establish Q-DPP, an extension of the Determinantal Point Process (DPP) with partition constraints, and apply it to multi-agent learning as a function approximator for the centralized value function. An attractive property of using Q-DPP is that when it reaches the optimum value, it can offer a natural factorization of the centralized value function, representing both quality (maximizing reward) and diversity (different behaviors). In the third part of the thesis, I depart from the action-level mutual influence and build a policy-space meta-game to analyze agents' relationship between adaptive policies. I present a Multi-Agent Trust Region Learning (MATRL) algorithm that augments single-agent trust region policy optimization with a weak stable fixed point approximated by the policy-space meta-game. The algorithm aims to find a game-theoretic mechanism to adjust the policy optimization steps that force the learning of all agents toward the stable point

UCL Discovery

Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems

Author: Abdul-Rahman
Ahmadi
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Alonso
Anderson
Aumann
Avrahami-Zilberbrand
Avrahami-Zilberbrand
Avrahami-Zilberbrand
Baarslag
Baker
Baker
Baker
Bakkes
Banerjee
Banerjee
Banerjee
Bard
Bard
Barrett
Barrett
Barrett
Baré
Bellman
Bengio
Billings
Blaylock
Blaylock
Blaylock
Bloembergen
Bolander
Bombini
Borck
Boutilier
Boutilier
Bowling
Bowling
Bowling
Boyen
Brown
Browne
Buehler
Bui
Busoniu
Cadilhac
Camerer
Camerer
Campbell
Carberry
Carmel
Carmel
Carmel
Carmel
Carmel
Carmel
Carmel
Chajewska
Chajewska
Chakraborty
Chakraborty
Chalkiadakis
Chaloner
Chandrasekaran
Charniak
Claus
Coehoorn
Cohen
Cohen
Conitzer
Cortes
Crandall
Dasgupta
Davidson
Davison
de Farias
de Weerd
de Weerd
Dean
Dekel
Denzinger
Doshi
Doshi
Doshi
Doshi
Doucet
Erdogan
Fagan
Fagundes
Fern
Fikes
Foster
Foster
Fredkin
Fudenberg
Fürnkranz
Gal
Gal
Gal
Gal
Ganzfried
Geib
Geib
Geib
Geib
Ghaderi
Gmytrasiewicz
Gmytrasiewicz
Gmytrasiewicz
Gmytrasiewicz
Gmytrasiewicz
Gold
Gold
Goodie
Grosz
Grosz
Guerra-Hernández
Hammond
Harsanyi
Harsanyi
Harsanyi
Harsanyi
Hart
Hausknecht
Hawasly
He
Hedden
Hernandez-Leal
Hernandez-Leal
Hindriks
Hoang
Hoehn
Hong
Hong
Horst
Howard
Howard
Howard
Hsieh
Huynh
Iglesias
Iglesias
Iida
Iida
Iida
Illobre
Jarvis
Jensen
Jensen
Johanson
Johanson
Kaelbling
Kalai
Kaminka
Kaminka
Karpinskyj
Kautz
Kearns
Keren
Keren
Keren
Kerkez
Kitano
Kocsis
Koller
Koller
Kolodner
Kominis
Kuhlmann
La Mura
Lasota
Lattner
Laviers
Ledezma
Lesh
Litman
Lockett
Löwe
Markovitch
McCalla
McCarthy
McCarthy
McCracken
McTear
Mealing
Milch
Millington
Miorandi
Mor
Muggleton
Mui
Muise
Myerson
Nachbar
Nash
Ng
Ng
Nguyen
Nielsen
Nyarko
Oh
Olorunleke
Panait
Panella
Pearl
Peter Stone
Pinyol
Pitt
Pollack
Pourmehr
Powers
Pynadath
Ramchurn
Ramırez
Ramírez
Ramírez
Rathnasabapathy
Reibman
Riley
Riley
Rovatsos
Royer
Rubin
Sabater
Sadigh
Saria
Schadd
Schillo
Schmid
Schmidt
Sen
Sen
Settles
Shachter
Silver
Singh
Sohrabi
Sondik
Sonu
Southey
Spronck
Stefano V. Albrecht
Steffens
Steffens
Steffens
Stone
Stone
Stone
Stone
Sukthankar
Sukthankar
Sukthankar
Suryadi
Synnaeve
Takahashi
Tambe
Tambe
Tambe
Tambe
Tian
Tuyls
van den Herik
Van Der Hoek
Veloso
Vered
Vickrey
Vidal
Visser
Von Neumann
Wang
Watkins
Wayllace
Weber
Wilks
Wright
Yoshida
Yu
Zeng
Zhuo
Zhuo
Zukerman
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.Comment: Final manuscript (46 pages), published in Artificial Intelligence Journal. The arXiv version also contains a table of contents after the abstract, but is otherwise identical to the AIJ version. Keywords: autonomous agents, multiagent systems, modelling other agents, opponent modellin

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

An Approach to Guide Users Towards Less Revealing Internet Browsers

Author: Mohammad Lena
Mohsen Fadi
Naser Riham
Shtayyeh Adel
Struijk Marten
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/07/2022
Field of study

When browsing the Internet, HTTP headers enable both clients and servers send extra data in their requests or responses such as the User-Agent string. This string contains information related to the sender’s device, browser, and operating system. Previous research has shown that there are numerous privacy and security risks result from exposing sensitive information in the User-Agent string. For example, it enables device and browser fingerprinting and user tracking and identification. Our large analysis of thousands of User-Agent strings shows that browsers differ tremendously in the amount of information they include in their User-Agent strings. As such, our work aims at guiding users towards using less exposing browsers. In doing so, we propose to assign an exposure score to browsers based on the information they expose and vulnerability records. Thus, our contribution in this work is as follows: first, provide a full implementation that is ready to be deployed and used by users. Second, conduct a user study to identify the effectiveness and limitations of our proposed approach. Our implementation is based on using more than 52 thousand unique browsers. Our performance and validation analysis show that our solution is accurate and efficient. The source code and data set are publicly available and the solution has been deployed

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Economics of Conflict and Terrorism

Author
Publication venue: 'MDPI AG'
Publication date: 21/06/2022
Field of study

This book contributes to the literature on conflict and terrorism through a selection of articles that deal with theoretical, methodological and empirical issues related to the topic. The papers study important problems, are original in their approach and innovative in the techniques used. This will be useful for researchers in the fields of game theory, economics and political sciences

Directory of Open Access Books (DOAB)