202 research outputs found

    Representation Policy Iteration

    Full text link
    This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least squares policy iteration (LSPI). The key innovation is a coordinate-free representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value functions, where the basis functions reflect the largescale topology of the underlying state space. A new class of algorithms called Representation Policy Iteration (RPI) are presented that automatically learn both basis functions and approximately optimal policies. Illustrative experiments compare the performance of RPI with that of LSPI using two handcoded basis functions (RBF and polynomial state encodings).Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005

    Universal Imitation Games

    Full text link
    Alan Turing proposed in 1950 a framework called an imitation game to decide if a machine could think. Using mathematics developed largely after Turing -- category theory -- we analyze a broader class of universal imitation games (UIGs), which includes static, dynamic, and evolutionary games. In static games, the participants are in a steady state. In dynamic UIGs, "learner" participants are trying to imitate "teacher" participants over the long run. In evolutionary UIGs, the participants are competing against each other in an evolutionary game, and participants can go extinct and be replaced by others with higher fitness. We use the framework of category theory -- in particular, two influential results by Yoneda -- to characterize each type of imitation game. Universal properties in categories are defined by initial and final objects. We characterize dynamic UIGs where participants are learning by inductive inference as initial algebras over well-founded sets, and contrast them with participants learning by conductive inference over the final coalgebra of non-well-founded sets. We briefly discuss the extension of our categorical framework for UIGs to imitation games on quantum computers.Comment: 98 pages. arXiv admin note: substantial text overlap with arXiv:2402.1873

    Manifold Alignment using Procrustes Analysis

    Get PDF
    In this paper we introduce a novel approach to manifold alignment, based on Procrustes analysis. Our approach di®ers from \semi- supervised alignment in that it results in a mapping that is de¯ned everywhere { when used with a suitable dimensionality reduction method { rather than just on the training data points. We describe and evaluate our approach both theoretically and experimen- tally, providing results showing useful knowl- edge transfer from one domain to another. Novel applications of our method including cross-lingual information retrieval and trans- fer learning in Markov decision processes are presented

    Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

    Full text link
    Large language models (LLMs) have shown their power in different areas. Attention computation, as an important subroutine of LLMs, has also attracted interests in theory. Recently the static computation and dynamic maintenance of attention matrix has been studied by [Alman and Song 2023] and [Brand, Song and Zhou 2023] from both algorithmic perspective and hardness perspective. In this work, we consider the sparsification of the attention problem. We make one simplification which is the logit matrix is symmetric. Let nn denote the length of sentence, let dd denote the embedding dimension. Given a matrix XRn×dX \in \mathbb{R}^{n \times d}, suppose dnd \gg n and XX<r\| X X^\top \|_{\infty} < r with r(0,0.1)r \in (0,0.1), then we aim for finding YRn×mY \in \mathbb{R}^{n \times m} (where mdm\ll d) such that \begin{align*} \| D(Y)^{-1} \exp( Y Y^\top ) - D(X)^{-1} \exp( X X^\top) \|_{\infty} \leq O(r) \end{align*} We provide two results for this problem. \bullet Our first result is a randomized algorithm. It runs in O~(nnz(X)+nω)\widetilde{O}(\mathrm{nnz}(X) + n^{\omega} ) time, has 1δ1-\delta succeed probability, and chooses m=O(nlog(n/δ))m = O(n \log(n/\delta)). Here nnz(X)\mathrm{nnz}(X) denotes the number of non-zero entries in XX. We use ω\omega to denote the exponent of matrix multiplication. Currently ω2.373\omega \approx 2.373. \bullet Our second result is a deterministic algorithm. It runs in O~(min{i[d]nnz(Xi)2,dnω1}+nω+1)\widetilde{O}(\min\{\sum_{i\in[d]}\mathrm{nnz}(X_i)^2, dn^{\omega-1}\} + n^{\omega+1}) time and chooses m=O(n)m = O(n). Here XiX_i denote the ii-th column of matrix XX. Our main findings have the following implication for applied LLMs task: for any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence
    corecore