2 research outputs found

    A Gang of Adversarial Bandits

    Get PDF
    We consider running multiple instances of multi-armed bandit (MAB) problems in parallel. A main motivation for this study are online recommendation systems, in which each of N users is associated with a MAB problem and the goal is to exploit users' similarity in order to learn users' preferences to K items more efficiently. We consider the adversarial MAB setting, whereby an adversary is free to choose which user and which loss to present to the learner during the learning process. Users are in a social network and the learner is aided by a-priori knowledge of the strengths of the social links between all pairs of users. It is assumed that if the social link between two users is strong then they tend to share the same action. The regret is measured relative to an arbitrary function which maps users to actions. The smoothness of the function is captured by a resistance-based dispersion measure Ψ. We present two learning algorithms, GABA-I and GABA-II which exploit the network structure to bias towards functions of low Ψ values. We show that GABA-I has an expected regret bound of O(pln(N K/Ψ)ΨKT) and per-trial time complexity of O(K ln(N)), whilst GABA-II has a weaker O(pln(N/Ψ) ln(N K/Ψ)ΨKT) regret, but a better O(ln(K) ln(N)) per-trial time complexity. We highlight improvements of both algorithms over running independent standard MABs across users

    Adaptive Online Learning

    Get PDF
    The research that constitutes this thesis was driven by the two related goals in mind. The first one was to develop new efficient online learning algorithms and to study their properties and theoretical guarantees. The second one was to study real-world data and find algorithms appropriate for the particular real-world problems. This thesis studies online prediction with few assumptions about the nature of the data. This is important for real-world applications of machine learning as complex assumptions about the data are rarely justified. We consider two frameworks: conformal prediction, which is based on the randomness assumption, and prediction with expert advice, where no assumptions about the data are made at all. Conformal predictors are set predictors, that is a set of possible labels is issued by Learner at each trial. After the prediction is made the real label is revealed and Learner's prediction is evaluated. 10 case of classification the label space is finite so Learner makes an error if the true label is not in the set produced by Learner. Conformal prediction was originally developed for the supervised learning task and was proved to be valid in the sense of making errors with a prespecified probability. We will study possible ways of extending this approach to the semi-supervised case and build a valid algorithm for this t ask. Also, we will apply conformal prediction technique to the problem of diagnosing tuberculosis in cattle. Whereas conformal prediction relies on just the randomness assumption, prediction with expert advice drops this one as well. One may wonder whether it is possible to make good predictions under these circumstances. However Learner is provided with predictions of a certain class of experts (or prediction strategies) and may base his prediction on them. The goal then is to perform not much worse than the best strategy in the class. This is achieved by carefully mixing (aggregating) predictions of the base experts. However, often the nature of data changes over time, such that there is a region where one expert is good, followed by a region where another is good and so on. This leads to the algorithms which we call adaptive: they take into account this structure of the data. We explore the possibilities offered by the framework of specialist experts to build adaptive algorithms. This line of thought allows us then to provide an intuitive explanation for the mysterious Mixing Past Posteriors algorithm and build a new algorithm with sharp bounds for Online Multitask Learning.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore