3,653 research outputs found

    Grounding Language for Transfer in Deep Reinforcement Learning

    Full text link
    In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.Comment: JAIR 201

    Online Multitask Learning with Long-Term Memory

    Get PDF
    We introduce a novel online multitask setting. In this setting each task is partitioned into a sequence of segments that is unknown to the learner. Associated with each segment is a hypothesis from some hypothesis class. We give algorithms that are designed to exploit the scenario where there are many such segments but significantly fewer associated hypotheses. We prove regret bounds that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting this is equivalent to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]. We provide an algorithm that predicts on each trial in time linear in the number of hypotheses when the hypothesis class is finite. We also consider infinite hypothesis classes from reproducing kernel Hilbert spaces for which we give an algorithm whose per trial time complexity is cubic in the number of cumulative trials. In the single-task special case this is the first example of an efficient regret-bounded switching algorithm with long-term memory for a non-parametric hypothesis class

    Online Matrix Completion with Side Information

    Get PDF
    This thesis considers the problem of binary matrix completion with side information in the online setting and the applications thereof. The side information provides additional information on the rows and columns and can yield improved results compared to when such information is not available. We present efficient and general algorithms in transductive and inductive models. The performance guarantees that we prove are with respect to the matrix complexity measures of the max-norm and the margin complexity. We apply our bounds to the hypothesis class of biclustered matrices. Such matrices can be permuted through the rows and columns into homogeneous latent blocks. This class is a natural choice for our problem since the margin complexity and max-norm of these matrices have an upper bound that is easy to interpret in terms of the latent dimensions. We also apply our algorithms to a novel online multitask setting with RKHS hypothesis classes. In this setting, each task is partitioned in a sequence of segments, where a hypothesis is associated with each segment. Our algorithms are designed to exploit the scenario where the number of associated hypotheses is much smaller than the number of segments. We prove performance guarantees that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting, this is analogous to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]
    • …
    corecore