51,770 research outputs found
Deep Q-Learning for Nash Equilibria: Nash-DQN
Model-free learning for multi-agent stochastic games is an active area of
research. Existing reinforcement learning algorithms, however, are often
restricted to zero-sum games, and are applicable only in small state-action
spaces or other simplified settings. Here, we develop a new data efficient
Deep-Q-learning methodology for model-free learning of Nash equilibria for
general-sum stochastic games. The algorithm uses a local linear-quadratic
expansion of the stochastic game, which leads to analytically solvable optimal
actions. The expansion is parametrized by deep neural networks to give it
sufficient flexibility to learn the environment without the need to experience
all state-action pairs. We study symmetry properties of the algorithm stemming
from label-invariant stochastic games and as a proof of concept, apply our
algorithm to learning optimal trading strategies in competitive electronic
markets.Comment: 16 pages, 4 figure
Stochastic Answer Networks for Machine Reading Comprehension
We propose a simple yet robust stochastic answer network (SAN) that simulates
multi-step reasoning in machine reading comprehension. Compared to previous
work such as ReasoNet which used reinforcement learning to determine the number
of steps, the unique feature is the use of a kind of stochastic prediction
dropout on the answer module (final layer) of the neural network during the
training. We show that this simple trick improves robustness and achieves
results competitive to the state-of-the-art on the Stanford Question Answering
Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading
COmprehension Dataset (MS MARCO).Comment: 11 pages, 5 figures, Accepted to ACL 201
RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets
In this paper, we propose a class of robust stochastic subgradient methods
for distributed learning from heterogeneous datasets at presence of an unknown
number of Byzantine workers. The Byzantine workers, during the learning
process, may send arbitrary incorrect messages to the master due to data
corruptions, communication failures or malicious attacks, and consequently bias
the learned model. The key to the proposed methods is a regularization term
incorporated with the objective function so as to robustify the learning task
and mitigate the negative effects of Byzantine attacks. The resultant
subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation
methods, justifying our acronym RSA used henceforth. In contrast to most of the
existing algorithms, RSA does not rely on the assumption that the data are
independent and identically distributed (i.i.d.) on the workers, and hence fits
for a wider class of applications. Theoretically, we show that: i) RSA
converges to a near-optimal solution with the learning error dependent on the
number of Byzantine workers; ii) the convergence rate of RSA under Byzantine
attacks is the same as that of the stochastic gradient descent method, which is
free of Byzantine attacks. Numerically, experiments on real dataset corroborate
the competitive performance of RSA and a complexity reduction compared to the
state-of-the-art alternatives.Comment: To appear in AAAI 201
A Parameter-Free Learning Automaton Scheme
For a learning automaton, a proper configuration of its learning parameters,
which are crucial for the automaton's performance, is relatively difficult due
to the necessity of a manual parameter tuning before real applications. To
ensure a stable and reliable performance in stochastic environments, parameter
tuning can be a time-consuming and interaction-costing procedure in the field
of LA. Especially, it is a fatal limitation for LA-based applications where the
interactions with environments are expensive.
In this paper, we propose a parameter-free learning automaton scheme to avoid
parameter tuning by a Bayesian inference method. In contrast to existing
schemes where the parameters should be carefully tuned according to the
environment, the performance of this scheme is not sensitive to external
environments because a set of parameters can be consistently applied to various
environments, which dramatically reduce the difficulty of applying a learning
automaton to an unknown stochastic environment. A rigorous proof of
-optimality for the proposed scheme is provided and numeric
experiments are carried out on benchmark environments to verify its
effectiveness. The results show that, without any parameter tuning cost, the
proposed parameter-free learning automaton (PFLA) can achieve a competitive
performance compared with other well-tuned schemes and outperform untuned
schemes on consistency of performance
Extended Distributed Learning Automata:A New Method for Solving Stochastic Graph Optimization Problems
In this paper, a new structure of cooperative learning automata so-called
extended learning automata (eDLA) is introduced. Based on the proposed
structure, a new iterative randomized heuristic algorithm for finding optimal
sub-graph in a stochastic edge-weighted graph through sampling is proposed. It
has been shown that the proposed algorithm based on new networked-structure can
be to solve the optimization problems on stochastic graph through less number
of sampling in compare to standard sampling. Stochastic graphs are graphs in
which the edges have an unknown distribution probability weights. Proposed
algorithm uses an eDLA to find a policy that leads to an induced sub-graph that
satisfies some restrictions such as minimum or maximum weight (length). At each
stage of the proposed algorithm, eDLA determines which edges to be sampled.
This eDLA-based proposed sampling method may result in decreasing unnecessary
samples and hence decreasing the time that algorithm requires for finding the
optimal sub-graph. It has been shown that proposed method converge to optimal
solution, furthermore the probability of this convergence can be made
arbitrarily close to 1 by using a sufficiently small learning rate. A new
variance-aware threshold value was proposed that can be improving significantly
convergence rate of the proposed eDLA-based algorithm. It has been shown that
the proposed algorithm is competitive in terms of the quality of the solutio
Learning Rate Adaptation for Federated and Differentially Private Learning
We propose an algorithm for the adaptation of the learning rate for
stochastic gradient descent (SGD) that avoids the need for validation set use.
The idea for the adaptiveness comes from the technique of extrapolation: to get
an estimate for the error against the gradient flow which underlies SGD, we
compare the result obtained by one full step and two half-steps. The algorithm
is applied in two separate frameworks: federated and differentially private
learning. Using examples of deep neural networks we empirically show that the
adaptive algorithm is competitive with manually tuned commonly used
optimisation methods for differentially privately training. We also show that
it works robustly in the case of federated learning unlike commonly used
optimisation methods.Comment: 17 pages, 9 figure
Fast Quantization of Stochastic Volatility Models
Recursive Marginal Quantization (RMQ) allows fast approximation of solutions
to stochastic differential equations in one-dimension. When applied to two
factor models, RMQ is inefficient due to the fact that the optimization problem
is usually performed using stochastic methods, e.g., Lloyd's algorithm or
Competitive Learning Vector Quantization. In this paper, a new algorithm is
proposed that allows RMQ to be applied to two-factor stochastic volatility
models, which retains the efficiency of gradient-descent techniques. By
margining over potential realizations of the volatility process, a significant
decrease in computational effort is achieved when compared to current
quantization methods. Additionally, techniques for modelling the correct
zero-boundary behaviour are used to allow the new algorithm to be applied to
cases where the previous methods would fail. The proposed technique is
illustrated for European options on the Heston and Stein-Stein models, while a
more thorough application is considered in the case of the popular SABR model,
where various exotic options are also priced
Conditional Generative Moment-Matching Networks
Maximum mean discrepancy (MMD) has been successfully applied to learn deep
generative models for characterizing a joint distribution of variables via
kernel mean embedding. In this paper, we present conditional generative moment-
matching networks (CGMMN), which learn a conditional distribution given some
input variables based on a conditional maximum mean discrepancy (CMMD)
criterion. The learning is performed by stochastic gradient descent with the
gradient calculated by back-propagation. We evaluate CGMMN on a wide range of
tasks, including predictive modeling, contextual generation, and Bayesian dark
knowledge, which distills knowledge from a Bayesian model by learning a
relatively small CGMMN student network. Our results demonstrate competitive
performance in all the tasks.Comment: 12 page
Projective simulation for classical learning agents: a comprehensive investigation
We study the model of projective simulation (PS), a novel approach to
artificial intelligence based on stochastic processing of episodic memory which
was recently introduced [H.J. Briegel and G. De las Cuevas. Sci. Rep. 2, 400,
(2012)]. Here we provide a detailed analysis of the model and examine its
performance, including its achievable efficiency, its learning times and the
way both properties scale with the problems' dimension. In addition, we situate
the PS agent in different learning scenarios, and study its learning abilities.
A variety of new scenarios are being considered, thereby demonstrating the
model's flexibility. Furthermore, to put the PS scheme in context, we compare
its performance with those of Q-learning and learning classifier systems, two
popular models in the field of reinforcement learning. It is shown that PS is a
competitive artificial intelligence model of unique properties and strengths.Comment: Accepted for publication in New Generation Computing. 23 pages, 23
figure
Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems
Reinforcement learning is a promising approach to learning robot controllers.
It has recently been shown that algorithms based on finite-difference estimates
of the policy gradient are competitive with algorithms based on the policy
gradient theorem. We propose a theoretical framework for understanding this
phenomenon. Our key insight is that many dynamical systems (especially those of
interest in robot control tasks) are \emph{nearly deterministic}---i.e., they
can be modeled as a deterministic system with a small stochastic perturbation.
We show that for such systems, finite-difference estimates of the policy
gradient can have substantially lower variance than estimates based on the
policy gradient theorem. We interpret these results in the context of
counterfactual estimation. Finally, we empirically evaluate our insights in an
experiment on the inverted pendulum
- …