19 research outputs found
Towards Optimal Algorithms For Online Decision Making Under Practical Constraints
Artificial Intelligence is increasingly being used in real-life applications such as driving with autonomous cars; deliveries with autonomous drones; customer support with chat-bots; personal assistant with smart speakers . . . An Artificial Intelligent agent (AI) can be trained to become expert at a task through a system of rewards and punishment, also well known as Reinforcement Learning (RL). However, since the AI will deal with human beings, it also has to follow some moral rules to accomplish any task. For example, the AI should be fair to the other agents and not destroy the environment. Moreover, the AI should not leak the privacy of users’ data it processes. Those rules represent significant challenges in designing AI that we tackle in this thesis through mathematically rigorous solutions.More precisely, we start by considering the basic RL problem modeled as a discrete Markov Decision Process. We propose three simple algorithms (UCRL-V, BUCRL and TSUCRL) using two different paradigms: Frequentist (UCRL-V) and Bayesian (BUCRL and TSUCRL). Through a unified theoretical analysis, we show that our three algorithms are near-optimal. Experiments performed confirm the superiority of our methods compared to existing techniques. Afterwards, we address the issue of fairness in the stateless version of reinforcement learning also known as multi-armed bandit. To concentrate our effort on the key challenges, we focus on two-agents multi-armed bandit. We propose a novel objective that has been shown to be connected to fairness and justice. We derive an algorithm UCRG to solve this novel objective and show theoretically its near-optimality. Next, we tackle the issue of privacy by using the recently introduced notion of Differential Privacy. We design multi-armed bandit algorithms that preserve differential-privacy. Theoretical analyses show that for the same level of privacy, our newly developed algorithms achieve better performance than existing techniques
Game of Thrones: Fully Distributed Learning for Multi-Player Bandits
We consider a multi-armed bandit game where N players compete for M arms for
T turns. Each player has different expected rewards for the arms, and the
instantaneous rewards are independent and identically distributed or Markovian.
When two or more players choose the same arm, they all receive zero reward.
Performance is measured using the expected sum of regrets, compared to optimal
assignment of arms to players. We assume that each player only knows her
actions and the reward she received each turn. Players cannot observe the
actions of other players, and no communication between players is possible. We
present a distributed algorithm and prove that it achieves an expected sum of
regrets of near-O\left(\log T\right). This is the first algorithm to achieve a
near order optimal regret in this fully distributed scenario. All other works
have assumed that either all players have the same vector of expected rewards
or that communication between players is possible.Comment: A preliminary version was accepted to NIPS 2018. This extended paper,
currently under review (submitted in September 2019), improves the regret
bound to near-log(T), generalizes to unbounded and Markovian rewards and has
a much better convergence rat
Learning to cooperate without awareness in multiplayer minimal social situations
a b s t r a c t Experimental and Monte Carlo methods were used to test theoretical predictions about adaptive learning of cooperative responses without awareness in minimal social situations-games in which the payoffs to players depend not on their own actions but exclusively on the actions of other group members. In Experiment 1, learning occurred slowly over 200 rounds in a dyadic minimal social situation but not in multiplayer groups. In Experiments 2-4, learning occurred rarely in multiplayer groups, even when players were informed that they were interacting strategically and were allowed to communicate with one another but were not aware of the game's payoff structure. Monte Carlo simulation suggested that players approach minimal social situations using a noisy version of the win-stay, lose-shift decision rule, deviating from the deterministic rule less frequently after rewarding than unrewarding rounds
Cooperative Coevolution for Non-Separable Large-Scale Black-Box Optimization: Convergence Analyses and Distributed Accelerations
Given the ubiquity of non-separable optimization problems in real worlds, in
this paper we analyze and extend the large-scale version of the well-known
cooperative coevolution (CC), a divide-and-conquer optimization framework, on
non-separable functions. First, we reveal empirical reasons of why
decomposition-based methods are preferred or not in practice on some
non-separable large-scale problems, which have not been clearly pointed out in
many previous CC papers. Then, we formalize CC to a continuous game model via
simplification, but without losing its essential property. Different from
previous evolutionary game theory for CC, our new model provides a much simpler
but useful viewpoint to analyze its convergence, since only the pure Nash
equilibrium concept is needed and more general fitness landscapes can be
explicitly considered. Based on convergence analyses, we propose a hierarchical
decomposition strategy for better generalization, as for any decomposition
there is a risk of getting trapped into a suboptimal Nash equilibrium. Finally,
we use powerful distributed computing to accelerate it under the multi-level
learning framework, which combines the fine-tuning ability from decomposition
with the invariance property of CMA-ES. Experiments on a set of
high-dimensional functions validate both its search performance and scalability
(w.r.t. CPU cores) on a clustering computing platform with 400 CPU cores
Quantum inspired algorithms for learning and control of stochastic systems
Motivated by the limitations of the current reinforcement learning and optimal control techniques, this dissertation proposes quantum theory inspired algorithms for learning and control of both single-agent and multi-agent stochastic systems.
A common problem encountered in traditional reinforcement learning techniques is the exploration-exploitation trade-off. To address the above issue an action selection procedure inspired by a quantum search algorithm called Grover\u27s iteration is developed. This procedure does not require an explicit design parameter to specify the relative frequency of explorative/exploitative actions.
The second part of this dissertation extends the powerful adaptive critic design methodology to solve finite horizon stochastic optimal control problems. To numerically solve the stochastic Hamilton Jacobi Bellman equation, which characterizes the optimal expected cost function, large number of trajectory samples are required. The proposed methodology overcomes the above difficulty by using the path integral control formulation to adaptively sample trajectories of importance.
The third part of this dissertation presents two quantum inspired coordination models to dynamically assign targets to agents operating in a stochastic environment. The first approach uses a quantum decision theory model that explains irrational action choices in human decision making. The second approach uses a quantum game theory model that exploits the quantum mechanical phenomena \u27entanglement\u27 to increase individual pay-off in multi-player games. The efficiency and scalability of the proposed coordination models are demonstrated through simulations of a large scale multi-agent system --Abstract, page iii