167 research outputs found
Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces
We tackle the problem of learning equilibria in simulation-based games. In
such games, the players' utility functions cannot be described analytically, as
they are given through a black-box simulator that can be queried to obtain
noisy estimates of the utilities. This is the case in many real-world games in
which a complete description of the elements involved is not available upfront,
such as complex military settings and online auctions. In these situations, one
usually needs to run costly simulation processes to get an accurate estimate of
the game outcome. As a result, solving these games begets the challenge of
designing learning algorithms that can find (approximate) equilibria with high
confidence, using as few simulator queries as possible. Moreover, since running
the simulator during the game is unfeasible, the algorithms must first perform
a pure exploration learning phase and, then, use the (approximate) equilibrium
learned this way to play the game. In this work, we focus on two-player
zero-sum games with infinite strategy spaces. Drawing from the best arm
identification literature, we design two algorithms with theoretical guarantees
to learn maximin strategies in these games. The first one works in the
fixed-confidence setting, guaranteeing the desired confidence level while
minimizing the number of queries. Instead, the second algorithm fits the
fixed-budget setting, maximizing the confidence without exceeding the given
maximum number of queries. First, we formally prove {\delta}-PAC theoretical
guarantees for our algorithms under some regularity assumptions, which are
encoded by letting the utility functions be drawn from a Gaussian process.
Then, we experimentally evaluate our techniques on a testbed made of randomly
generated games and instances representing simple real-world security settings
Towards Optimal Algorithms For Online Decision Making Under Practical Constraints
Artificial Intelligence is increasingly being used in real-life applications such as driving with autonomous cars; deliveries with autonomous drones; customer support with chat-bots; personal assistant with smart speakers . . . An Artificial Intelligent agent (AI) can be trained to become expert at a task through a system of rewards and punishment, also well known as Reinforcement Learning (RL). However, since the AI will deal with human beings, it also has to follow some moral rules to accomplish any task. For example, the AI should be fair to the other agents and not destroy the environment. Moreover, the AI should not leak the privacy of users’ data it processes. Those rules represent significant challenges in designing AI that we tackle in this thesis through mathematically rigorous solutions.More precisely, we start by considering the basic RL problem modeled as a discrete Markov Decision Process. We propose three simple algorithms (UCRL-V, BUCRL and TSUCRL) using two different paradigms: Frequentist (UCRL-V) and Bayesian (BUCRL and TSUCRL). Through a unified theoretical analysis, we show that our three algorithms are near-optimal. Experiments performed confirm the superiority of our methods compared to existing techniques. Afterwards, we address the issue of fairness in the stateless version of reinforcement learning also known as multi-armed bandit. To concentrate our effort on the key challenges, we focus on two-agents multi-armed bandit. We propose a novel objective that has been shown to be connected to fairness and justice. We derive an algorithm UCRG to solve this novel objective and show theoretically its near-optimality. Next, we tackle the issue of privacy by using the recently introduced notion of Differential Privacy. We design multi-armed bandit algorithms that preserve differential-privacy. Theoretical analyses show that for the same level of privacy, our newly developed algorithms achieve better performance than existing techniques
Is Nash Equilibrium Approximator Learnable?
In this paper, we investigate the learnability of the function approximator
that approximates Nash equilibrium (NE) for games generated from a
distribution. First, we offer a generalization bound using the Probably
Approximately Correct (PAC) learning model. The bound describes the gap between
the expected loss and empirical loss of the NE approximator. Afterward, we
prove the agnostic PAC learnability of the Nash approximator. In addition to
theoretical analysis, we demonstrate an application of NE approximator in
experiments. The trained NE approximator can be used to warm-start and
accelerate classical NE solvers. Together, our results show the practicability
of approximating NE through function approximation.Comment: Accepted by AAMAS 202
A learning-based approach to multi-agent decision-making
We propose a learning-based methodology to reconstruct private information
held by a population of interacting agents in order to predict an exact outcome
of the underlying multi-agent interaction process, here identified as a
stationary action profile. We envision a scenario where an external observer,
endowed with a learning procedure, is allowed to make queries and observe the
agents' reactions through private action-reaction mappings, whose collective
fixed point corresponds to a stationary profile. By adopting a smart query
process to iteratively collect sensible data and update parametric estimates,
we establish sufficient conditions to assess the asymptotic properties of the
proposed learning-based methodology so that, if convergence happens, it can
only be towards a stationary action profile. This fact yields two main
consequences: i) learning locally-exact surrogates of the action-reaction
mappings allows the external observer to succeed in its prediction task, and
ii) working with assumptions so general that a stationary profile is not even
guaranteed to exist, the established sufficient conditions hence act also as
certificates for the existence of such a desirable profile. Extensive numerical
simulations involving typical competitive multi-agent control and decision
making problems illustrate the practical effectiveness of the proposed
learning-based approach
A survey of random processes with reinforcement
The models surveyed include generalized P\'{o}lya urns, reinforced random
walks, interacting urn models, and continuous reinforced processes. Emphasis is
on methods and results, with sketches provided of some proofs. Applications are
discussed in statistics, biology, economics and a number of other areas.Comment: Published at http://dx.doi.org/10.1214/07-PS094 in the Probability
Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Generalized asset integrity games
Generalized assets represent a class of multi-scale adaptive state-transition systems with domain-oblivious performance criteria. The governance of such assets must proceed without exact specifications, objectives, or constraints. Decision making must rapidly scale in the presence of uncertainty, complexity, and intelligent adversaries.
This thesis formulates an architecture for generalized asset planning. Assets are modelled as dynamical graph structures which admit topological performance indicators, such as dependability, resilience, and efficiency. These metrics are used to construct robust model configurations. A normalized compression distance (NCD) is computed between a given active/live asset model and a reference configuration to produce an integrity score. The utility derived from the asset is monotonically proportional to this integrity score, which represents the proximity to ideal conditions. The present work considers the situation between an asset manager and an intelligent adversary, who act within a stochastic environment to control the integrity state of the asset. A generalized asset integrity game engine (GAIGE) is developed, which implements anytime algorithms to solve a stochastically perturbed two-player zero-sum game. The resulting planning strategies seek to stabilize deviations from minimax trajectories of the integrity score.
Results demonstrate the performance and scalability of the GAIGE. This approach represents a first-step towards domain-oblivious architectures for complex asset governance and anytime planning
- …