303 research outputs found

    Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

    Get PDF
    We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing new versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this by dynamically updating the set of active learning rates. For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201

    A Survey of Quantum Learning Theory

    Get PDF
    This paper surveys quantum learning theory: the theoretical aspects of machine learning using quantum computers. We describe the main results known for three models of learning: exact learning from membership queries, and Probably Approximately Correct (PAC) and agnostic learning from classical or quantum examples.Comment: 26 pages LaTeX. v2: many small changes to improve the presentation. This version will appear as Complexity Theory Column in SIGACT News in June 2017. v3: fixed a small ambiguity in the definition of gamma(C) and updated a referenc

    Confidence regions and minimax rates in outlier-robust estimation on the probability simplex

    Full text link
    We consider the problem of estimating the mean of a distribution supported by the kk-dimensional probability simplex in the setting where an ε\varepsilon fraction of observations are subject to adversarial corruption. A simple particular example is the problem of estimating the distribution of a discrete random variable. Assuming that the discrete variable takes kk values, the unknown parameter θ\boldsymbol \theta is a kk-dimensional vector belonging to the probability simplex. We first describe various settings of contamination and discuss the relation between these settings. We then establish minimax rates when the quality of estimation is measured by the total-variation distance, the Hellinger distance, or the L2\mathbb L^2-distance between two probability measures. We also provide confidence regions for the unknown mean that shrink at the minimax rate. Our analysis reveals that the minimax rates associated to these three distances are all different, but they are all attained by the sample average. Furthermore, we show that the latter is adaptive to the possible sparsity of the unknown vector. Some numerical experiments illustrating our theoretical findings are reported

    Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

    Full text link
    Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively in recent years. In the single-pass setting with KK arms and TT trials, a regret lower bound of Ω(T2/3)\Omega(T^{2/3}) has been proved for any algorithm with o(K)o(K) memory (Maiti et al. [NeurIPS'21]; Agarwal at al. [COLT'22]). On the other hand, however, the previous best regret upper bound is still O(K1/3T2/3log1/3(T))O(K^{1/3} T^{2/3}\log^{1/3}(T)), which is achieved by the streaming implementation of the simple uniform exploration. The O(K1/3log1/3(T))O(K^{1/3}\log^{1/3}(T)) gap leaves the open question of the tight regret bound in the single-pass MABs with sublinear arm memory. In this paper, we answer this open problem and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to Ω(K1/3T2/3)\Omega(K^{1/3}T^{2/3}) for algorithms with o(K)o(K) memory, which matches the uniform exploration regret up to a logarithm factor in TT. We then show that the log1/3(T)\log^{1/3}(T) factor is not necessary, and we can achieve O(K1/3T2/3)O(K^{1/3}T^{2/3}) regret by finding an ε\varepsilon-best arm and committing to it in the rest of the trials. For regret minimization with high constant probability, we can apply the single-memory ε\varepsilon-best arm algorithms in Jin et al. [ICML'21] to obtain the optimal bound. Furthermore, for the expected regret minimization, we design an algorithm with a single-arm memory that achieves O(K1/3T2/3log(K))O(K^{1/3} T^{2/3}\log(K)) regret, and an algorithm with O(log(n))O(\log^{*}(n))-memory with the optimal O(K1/3T2/3)O(K^{1/3} T^{2/3}) regret following the ε\varepsilon-best arm algorithm in Assadi and Wang [STOC'20]. We further tested the empirical performances of our algorithms. The simulation results show that the proposed algorithms consistently outperform the benchmark uniform exploration algorithm by a large margin, and on occasion, reduce the regret by up to 70%.Comment: ICML 202

    Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits

    Full text link
    Consider the following abstract coin tossing problem: Given a set of nn coins with unknown biases, find the most biased coin using a minimal number of coin tosses. This is a common abstraction of various exploration problems in theoretical computer science and machine learning and has been studied extensively over the years. In particular, algorithms with optimal sample complexity (number of coin tosses) have been known for this problem for quite some time. Motivated by applications to processing massive datasets, we study the space complexity of solving this problem with optimal number of coin tosses in the streaming model. In this model, the coins are arriving one by one and the algorithm is only allowed to store a limited number of coins at any point -- any coin not present in the memory is lost and can no longer be tossed or compared to arriving coins. Prior algorithms for the coin tossing problem with optimal sample complexity are based on iterative elimination of coins which inherently require storing all the coins, leading to memory-inefficient streaming algorithms. We remedy this state-of-affairs by presenting a series of improved streaming algorithms for this problem: we start with a simple algorithm which require storing only O(logn)O(\log{n}) coins and then iteratively refine it further and further, leading to algorithms with O(loglog(n))O(\log\log{(n)}) memory, O(log(n))O(\log^*{(n)}) memory, and finally a one that only stores a single extra coin in memory -- the same exact space needed to just store the best coin throughout the stream. Furthermore, we extend our algorithms to the problem of finding the kk most biased coins as well as other exploration problems such as finding top-kk elements using noisy comparisons or finding an ϵ\epsilon-best arm in stochastic multi-armed bandits, and obtain efficient streaming algorithms for these problems

    Text Assisted Insight Ranking Using Context-Aware Memory Network

    Full text link
    Extracting valuable facts or informative summaries from multi-dimensional tables, i.e. insight mining, is an important task in data analysis and business intelligence. However, ranking the importance of insights remains a challenging and unexplored task. The main challenge is that explicitly scoring an insight or giving it a rank requires a thorough understanding of the tables and costs a lot of manual efforts, which leads to the lack of available training data for the insight ranking problem. In this paper, we propose an insight ranking model that consists of two parts: A neural ranking model explores the data characteristics, such as the header semantics and the data statistical features, and a memory network model introduces table structure and context information into the ranking process. We also build a dataset with text assistance. Experimental results show that our approach largely improves the ranking precision as reported in multi evaluation metrics.Comment: Accepted to AAAI 201