12 research outputs found

    Teaching an Active Learner with Contrastive Examples

    Get PDF
    We study the problem of active learning with the added twist that the learner is assisted by a helpful teacher. We consider the following natural interaction protocol: At each round, the learner proposes a query asking for the label of an instance xqx^q, the teacher provides the requested label {xq,yq}\{x^q, y^q\} along with explanatory information to guide the learning process. In this paper, we view this information in the form of an additional contrastive example ({xc,yc}\{x^c, y^c\}) where xcx^c is picked from a set constrained by xqx^q (e.g., dissimilar instances with the same label). Our focus is to design a teaching algorithm that can provide an informative sequence of contrastive examples to the learner to speed up the learning process. We show that this leads to a challenging sequence optimization problem where the algorithm's choices at a given round depend on the history of interactions. We investigate an efficient teaching algorithm that adaptively picks these contrastive examples. We derive strong performance guarantees for our algorithm based on two problem-dependent parameters and further show that for specific types of active learners (e.g., a generalized binary search learner), the proposed teaching algorithm exhibits strong approximation guarantees. Finally, we illustrate our bounds and demonstrate the effectiveness of our teaching framework via two numerical case studies.Comment: Fix the illustrative exampl

    Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

    Full text link
    This paper studies bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that \emph{any} bandit algorithm with regret O(logT)O(\log T) can be forced to suffer a regret Ω(T)\Omega(T) with an expected amount of contamination O(logT)O(\log T). This amount of contamination is also necessary, as we prove that there exists an O(logT)O(\log T) regret bandit algorithm, specifically the classical UCB, that requires Ω(logT)\Omega(\log T) amount of contamination to suffer regret Ω(T)\Omega(T). To combat such poising attacks, our second main contribution is to propose a novel algorithm, Secure-UCB, which uses limited \emph{verification} to access a limited number of uncontaminated rewards. We show that with O(logT)O(\log T) expected number of verifications, Secure-UCB can restore the order optimal O(logT)O(\log T) regret \emph{irrespective of the amount of contamination} used by the attacker. Finally, we prove that for any bandit algorithm, this number of verifications O(logT)O(\log T) is necessary to recover the order-optimal regret. We can then conclude that Secure-UCB is order-optimal in terms of both the expected regret and the expected number of verifications, and can save stochastic bandits from any data poisoning attack
    corecore