707,582 research outputs found
Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics
Achieving convergence of multiple learning agents in general -player games
is imperative for the development of safe and reliable machine learning (ML)
algorithms and their application to autonomous systems. Yet it is known that,
outside the bounds of simple two-player games, convergence cannot be taken for
granted.
To make progress in resolving this problem, we study the dynamics of smooth
Q-Learning, a popular reinforcement learning algorithm which quantifies the
tendency for learning agents to explore their state space or exploit their
payoffs. We show a sufficient condition on the rate of exploration such that
the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in
any game. We connect this result to games for which Q-Learning is known to
converge with arbitrary exploration rates, including weighted Potential games
and weighted zero sum polymatrix games.
Finally, we examine the performance of the Q-Learning dynamic as measured by
the Time Averaged Social Welfare, and comparing this with the Social Welfare
achieved by the equilibrium. We provide a sufficient condition whereby the
Q-Learning dynamic will outperform the equilibrium even if the dynamics do not
converge.Comment: Accepted in AAMAS 202
ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling
ViZDoom is a robust, first-person shooter reinforcement learning environment,
characterized by a significant degree of latent state information. In this
paper, double-Q learning and prioritized experience replay methods are tested
under a certain ViZDoom combat scenario using a competitive deep recurrent
Q-network (DRQN) architecture. In addition, an ensembling technique known as
snapshot ensembling is employed using a specific annealed learning rate to
observe differences in ensembling efficacy under these two methods. Annealed
learning rates are important in general to the training of deep neural network
models, as they shake up the status-quo and counter a model's tending towards
local optima. While both variants show performance exceeding those of built-in
AI agents of the game, the known stabilizing effects of double-Q learning are
illustrated, and priority experience replay is again validated in its
usefulness by showing immediate results early on in agent development, with the
caveat that value overestimation is accelerated in this case. In addition, some
unique behaviors are observed to develop for priority experience replay (PER)
and double-Q (DDQ) variants, and snapshot ensembling of both PER and DDQ proves
a valuable method for improving performance of the ViZDoom Marine.Comment: 9 pages, 5 figure
Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners
Successful teaching requires an assumption of how the learner learns - how
the learner uses experiences from the world to update their internal states. We
investigate what expectations people have about a learner when they teach them
in an online manner using rewards and punishment. We focus on a common
reinforcement learning method, Q-learning, and examine what assumptions people
have using a behavioral experiment. To do so, we first establish a normative
standard, by formulating the problem as a machine teaching optimization
problem. To solve the machine teaching optimization problem, we use a deep
learning approximation method which simulates learners in the environment and
learns to predict how feedback affects the learner's internal states. What do
people assume about a learner's learning and discount rates when they teach
them an idealized exploration-exploitation task? In a behavioral experiment, we
find that people can teach the task to Q-learners in a relatively efficient and
effective manner when the learner uses a small value for its discounting rate
and a large value for its learning rate. However, they still are suboptimal. We
also find that providing people with real-time updates of how possible feedback
would affect the Q-learner's internal states weakly helps them teach. Our
results reveal how people teach using evaluative feedback and provide guidance
for how engineers should design machine agents in a manner that is intuitive
for people.Comment: 21 pages, 4 figure
Inverse Density as an Inverse Problem: The Fredholm Equation Approach
In this paper we address the problem of estimating the ratio
where is a density function and is another density, or, more generally
an arbitrary function. Knowing or approximating this ratio is needed in various
problems of inference and integration, in particular, when one needs to average
a function with respect to one probability distribution, given a sample from
another. It is often referred as {\it importance sampling} in statistical
inference and is also closely related to the problem of {\it covariate shift}
in transfer learning as well as to various MCMC methods. It may also be useful
for separating the underlying geometry of a space, say a manifold, from the
density function defined on it.
Our approach is based on reformulating the problem of estimating
as an inverse problem in terms of an integral operator
corresponding to a kernel, and thus reducing it to an integral equation, known
as the Fredholm problem of the first kind. This formulation, combined with the
techniques of regularization and kernel methods, leads to a principled
kernel-based framework for constructing algorithms and for analyzing them
theoretically.
The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized
Estimator) is flexible, simple and easy to implement.
We provide detailed theoretical analysis including concentration bounds and
convergence rates for the Gaussian kernel in the case of densities defined on
, compact domains in and smooth -dimensional sub-manifolds of
the Euclidean space.
We also show experimental results including applications to classification
and semi-supervised learning within the covariate shift framework and
demonstrate some encouraging experimental comparisons. We also show how the
parameters of our algorithms can be chosen in a completely unsupervised manner.Comment: Fixing a few typos in last versio
- …