829 research outputs found

    Unimodal Thompson Sampling for Graph-Structured Arms

    Full text link
    We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this setting, each arm corresponds to a node of a graph and each edge provides a relationship, unknown to the learner, between two nodes in terms of expected reward. Furthermore, for any node of the graph there is a path leading to the unique node providing the maximum expected reward, along which the expected reward is monotonically increasing. Previous results on this setting describe the behavior of frequentist MAB algorithms. In our paper, we design a Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting. We show that -as it happens in a wide number of scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In particular, we provide a thorough experimental evaluation of the performance of our and state-of-the-art algorithms as the properties of the graph vary

    Seeding with Costly Network Information

    Full text link
    We study the task of selecting kk nodes in a social network of size nn, to seed a diffusion with maximum expected spread size, under the independent cascade model with cascade probability pp. Most of the previous work on this problem (known as influence maximization) focuses on efficient algorithms to approximate the optimal seed set with provable guarantees, given the knowledge of the entire network. However, in practice, obtaining full knowledge of the network is very costly. To address this gap, we first study the achievable guarantees using o(n)o(n) influence samples. We provide an approximation algorithm with a tight (1-1/e){\mbox{OPT}}-\epsilon n guarantee, using Oϵ(k2logn)O_{\epsilon}(k^2\log n) influence samples and show that this dependence on kk is asymptotically optimal. We then propose a probing algorithm that queries Oϵ(pn2log4n+kpn1.5log5.5n+knlog3.5n){O}_{\epsilon}(p n^2\log^4 n + \sqrt{k p} n^{1.5}\log^{5.5} n + k n\log^{3.5}{n}) edges from the graph and use them to find a seed set with the same almost tight approximation guarantee. We also provide a matching (up to logarithmic factors) lower-bound on the required number of edges. To address the dependence of our probing algorithm on the independent cascade probability pp, we show that it is impossible to maintain the same approximation guarantees by controlling the discrepancy between the probing and seeding cascade probabilities. Instead, we propose to down-sample the probed edges to match the seeding cascade probability, provided that it does not exceed that of probing. Finally, we test our algorithms on real world data to quantify the trade-off between the cost of obtaining more refined network information and the benefit of the added information for guiding improved seeding strategies

    Noisy Submodular Maximization via Adaptive Sampling with Applications to Crowdsourced Image Collection Summarization

    Full text link
    We address the problem of maximizing an unknown submodular function that can only be accessed via noisy evaluations. Our work is motivated by the task of summarizing content, e.g., image collections, by leveraging users' feedback in form of clicks or ratings. For summarization tasks with the goal of maximizing coverage and diversity, submodular set functions are a natural choice. When the underlying submodular function is unknown, users' feedback can provide noisy evaluations of the function that we seek to maximize. We provide a generic algorithm -- \submM{} -- for maximizing an unknown submodular function under cardinality constraints. This algorithm makes use of a novel exploration module -- \blbox{} -- that proposes good elements based on adaptively sampling noisy function evaluations. \blbox{} is able to accommodate different kinds of observation models such as value queries and pairwise comparisons. We provide PAC-style guarantees on the quality and sampling cost of the solution obtained by \submM{}. We demonstrate the effectiveness of our approach in an interactive, crowdsourced image collection summarization application.Comment: Extended version of AAAI'16 pape

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data
    corecore