13 research outputs found

    Online Multi-task Learning with Hard Constraints

    Get PDF
    We discuss multi-task online learning when a decision maker has to deal simultaneously with M tasks. The tasks are related, which is modeled by imposing that the M-tuple of actions taken by the decision maker needs to satisfy certain constraints. We give natural examples of such restrictions and then discuss a general class of tractable constraints, for which we introduce computationally efficient ways of selecting actions, essentially by reducing to an on-line shortest path problem. We briefly discuss "tracking" and "bandit" versions of the problem and extend the model in various ways, including non-additive global losses and uncountably infinite sets of tasks

    Multitask Online Mirror Descent

    Full text link
    We introduce and analyze MT-OMD, a multitask generalization of Online Mirror Descent (OMD) which operates by sharing updates between tasks. We prove that the regret of MT-OMD is of order 1+σ2(N1)T\sqrt{1 + \sigma^2(N-1)}\sqrt{T}, where σ2\sigma^2 is the task variance according to the geometry induced by the regularizer, NN is the number of tasks, and TT is the time horizon. Whenever tasks are similar, that is σ21\sigma^2 \le 1, our method improves upon the NT\sqrt{NT} bound obtained by running independent OMDs on each task. We further provide a matching lower bound, and show that our multitask extensions of Online Gradient Descent and Exponentiated Gradient, two major instances of OMD, enjoy closed-form updates, making them easy to use in practice. Finally, we present experiments on both synthetic and real-world datasets supporting our findings

    On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure

    Full text link
    We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor). The objective is to learn the optimal (representation, predictor)-pair for each task, under the assumption that the optimal representation is common to all tasks. Within this framework, efficient learning algorithms should transfer knowledge across tasks. We consider the best-arm identification problem for a fixed confidence, where, in each round, the learner actively selects both a task, and an arm, and observes the corresponding reward. We derive instance-specific sample complexity lower bounds satisfied by any (δG,δH)(\delta_G,\delta_H)-PAC algorithm (such an algorithm identifies the best representation with probability at least 1δG1-\delta_G, and the best predictor for a task with probability at least 1δH1-\delta_H). We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as H(Glog(1/δG)+Xlog(1/δH))H(G\log(1/\delta_G)+ X\log(1/\delta_H)), with X,G,HX,G,H being, respectively, the number of tasks, representations and predictors. By comparison, this scaling is significantly better than the classical best-arm identification algorithm that scales as HGXlog(1/δ)HGX\log(1/\delta).Comment: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI23

    Online Multitask Learning with Long-Term Memory

    Get PDF
    We introduce a novel online multitask setting. In this setting each task is partitioned into a sequence of segments that is unknown to the learner. Associated with each segment is a hypothesis from some hypothesis class. We give algorithms that are designed to exploit the scenario where there are many such segments but significantly fewer associated hypotheses. We prove regret bounds that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting this is equivalent to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]. We provide an algorithm that predicts on each trial in time linear in the number of hypotheses when the hypothesis class is finite. We also consider infinite hypothesis classes from reproducing kernel Hilbert spaces for which we give an algorithm whose per trial time complexity is cubic in the number of cumulative trials. In the single-task special case this is the first example of an efficient regret-bounded switching algorithm with long-term memory for a non-parametric hypothesis class
    corecore