3,653 research outputs found
Grounding Language for Transfer in Deep Reinforcement Learning
In this paper, we explore the utilization of natural language to drive
transfer for reinforcement learning (RL). Despite the wide-spread application
of deep RL techniques, learning generalized policy representations that work
across domains remains a challenging problem. We demonstrate that textual
descriptions of environments provide a compact intermediate channel to
facilitate effective policy transfer. Specifically, by learning to ground the
meaning of text to the dynamics of the environment such as transitions and
rewards, an autonomous agent can effectively bootstrap policy learning on a new
domain given its description. We employ a model-based RL approach consisting of
a differentiable planning module, a model-free component and a factorized state
representation to effectively use entity descriptions. Our model outperforms
prior work on both transfer and multi-task scenarios in a variety of different
environments. For instance, we achieve up to 14% and 11.5% absolute improvement
over previously existing models in terms of average and initial rewards,
respectively.Comment: JAIR 201
Online Multitask Learning with Long-Term Memory
We introduce a novel online multitask setting. In this setting each task is
partitioned into a sequence of segments that is unknown to the learner.
Associated with each segment is a hypothesis from some hypothesis class. We
give algorithms that are designed to exploit the scenario where there are many
such segments but significantly fewer associated hypotheses. We prove regret
bounds that hold for any segmentation of the tasks and any association of
hypotheses to the segments. In the single-task setting this is equivalent to
switching with long-term memory in the sense of [Bousquet and Warmuth; 2003].
We provide an algorithm that predicts on each trial in time linear in the
number of hypotheses when the hypothesis class is finite. We also consider
infinite hypothesis classes from reproducing kernel Hilbert spaces for which we
give an algorithm whose per trial time complexity is cubic in the number of
cumulative trials. In the single-task special case this is the first example of
an efficient regret-bounded switching algorithm with long-term memory for a
non-parametric hypothesis class
Online Matrix Completion with Side Information
This thesis considers the problem of binary matrix completion with side information in the online setting and the applications thereof. The side information provides additional information on the rows and columns and can yield improved results compared to when such information is not available. We present efficient and general algorithms in transductive and inductive models. The performance guarantees that we prove are with respect to the matrix complexity measures of the max-norm and the margin complexity. We apply our bounds to the hypothesis class of biclustered matrices. Such matrices can be permuted through the rows and columns into homogeneous latent blocks. This class is a natural choice for our problem since the margin complexity and max-norm of these matrices have an upper bound that is easy to interpret in terms of the latent dimensions. We also apply our algorithms to a novel online multitask setting with RKHS hypothesis classes. In this setting, each task is partitioned in a sequence of segments, where a hypothesis is associated with each segment. Our algorithms are designed to exploit the scenario where the number of associated hypotheses is much smaller than the number of segments. We prove performance guarantees that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting, this is analogous to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]
- …