202 research outputs found
Representation Policy Iteration
This paper addresses a fundamental issue central to approximation methods for
solving large Markov decision processes (MDPs): how to automatically learn the
underlying representation for value function approximation? A novel
theoretically rigorous framework is proposed that automatically generates
geometrically customized orthonormal sets of basis functions, which can be used
with any approximate MDP solver like least squares policy iteration (LSPI). The
key innovation is a coordinate-free representation of value functions, using
the theory of smooth functions on a Riemannian manifold. Hodge theory yields a
constructive method for generating basis functions for approximating value
functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami)
operator on manifolds. In effect, this approach performs a global Fourier
analysis on the state space graph to approximate value functions, where the
basis functions reflect the largescale topology of the underlying state space.
A new class of algorithms called Representation Policy Iteration (RPI) are
presented that automatically learn both basis functions and approximately
optimal policies. Illustrative experiments compare the performance of RPI with
that of LSPI using two handcoded basis functions (RBF and polynomial state
encodings).Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
Universal Imitation Games
Alan Turing proposed in 1950 a framework called an imitation game to decide
if a machine could think. Using mathematics developed largely after Turing --
category theory -- we analyze a broader class of universal imitation games
(UIGs), which includes static, dynamic, and evolutionary games. In static
games, the participants are in a steady state. In dynamic UIGs, "learner"
participants are trying to imitate "teacher" participants over the long run. In
evolutionary UIGs, the participants are competing against each other in an
evolutionary game, and participants can go extinct and be replaced by others
with higher fitness. We use the framework of category theory -- in particular,
two influential results by Yoneda -- to characterize each type of imitation
game. Universal properties in categories are defined by initial and final
objects. We characterize dynamic UIGs where participants are learning by
inductive inference as initial algebras over well-founded sets, and contrast
them with participants learning by conductive inference over the final
coalgebra of non-well-founded sets. We briefly discuss the extension of our
categorical framework for UIGs to imitation games on quantum computers.Comment: 98 pages. arXiv admin note: substantial text overlap with
arXiv:2402.1873
Manifold Alignment using Procrustes Analysis
In this paper we introduce a novel approach to manifold alignment, based on Procrustes analysis. Our approach di®ers from \semi- supervised alignment in that it results in a mapping that is de¯ned everywhere { when used with a suitable dimensionality reduction method { rather than just on the training data points. We describe and evaluate our approach both theoretically and experimen- tally, providing results showing useful knowl- edge transfer from one domain to another. Novel applications of our method including cross-lingual information retrieval and trans- fer learning in Markov decision processes are presented
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension
Large language models (LLMs) have shown their power in different areas.
Attention computation, as an important subroutine of LLMs, has also attracted
interests in theory. Recently the static computation and dynamic maintenance of
attention matrix has been studied by [Alman and Song 2023] and [Brand, Song and
Zhou 2023] from both algorithmic perspective and hardness perspective. In this
work, we consider the sparsification of the attention problem. We make one
simplification which is the logit matrix is symmetric. Let denote the
length of sentence, let denote the embedding dimension. Given a matrix , suppose and with , then we aim for finding (where ) such that \begin{align*} \| D(Y)^{-1} \exp( Y Y^\top ) -
D(X)^{-1} \exp( X X^\top) \|_{\infty} \leq O(r) \end{align*} We provide two
results for this problem.
Our first result is a randomized algorithm. It runs in
time, has succeed
probability, and chooses . Here
denotes the number of non-zero entries in . We use to denote the
exponent of matrix multiplication. Currently .
Our second result is a deterministic algorithm. It runs in
time and chooses . Here denote the -th column
of matrix .
Our main findings have the following implication for applied LLMs task: for
any super large feature dimension, we can reduce it down to the size nearly
linear in length of sentence
- …