31 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    A Characterization of Multioutput Learnability

    Full text link
    We consider the problem of learning multioutput function classes in batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online settings. As an extension, we also consider multilabel learnability in the bandit feedback setting and show a similar characterization as in the full-feedback setting.Comment: 37, Updated Online Sectio

    Multiclass Online Learnability under Bandit Feedback

    Full text link
    We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online multiclass learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded.Comment: 11 page

    Guarantees for Efficient and Adaptive Online Learning

    Get PDF
    In this thesis, we study the problem of adaptive online learning in several different settings. We first study the problem of predicting graph labelings online which are assumed to change over time. We develop the machinery of cluster specialists which probabilistically exploit any cluster structure in the graph. We give a mistake-bounded algorithm that surprisingly requires only O(log n) time per trial for an n-vertex graph, an exponential improvement over existing methods. We then consider the model of non-stationary prediction with expert advice with long-term memory guarantees in the sense of Bousquet and Warmuth, in which we learn a small pool of experts. We consider relative entropy projection-based algorithms, giving a linear-time algorithm that improves on the best known regret bound. We show that such projection updates may be advantageous over previous "weight-sharing" approaches when weight updates come with implicit costs such as in portfolio optimization. We give an algorithm to compute the relative entropy projection onto the simplex with non-uniform (lower) box constraints in linear time, which may be of independent interest. We finally extend the model of long-term memory by introducing a new model of adaptive long-term memory. Here the small pool is assumed to change over time, with the trial sequence being partitioned into epochs and a small pool associated with each epoch. We give an efficient linear-time regret-bounded algorithm for this setting and present results in the setting of contextual bandits

    Trading-off payments and accuracy in online classification with paid stochastic experts

    Get PDF
    We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz “productivity” function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after T rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most O(K2(lnT)T−−√) where K is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order T2/3 one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data

    De l'apprentissage faiblement supervisé au catalogage en ligne

    Get PDF
    Applied mathematics and machine computations have raised a lot of hope since the recent success of supervised learning. Many practitioners in industries have been trying to switch from their old paradigms to machine learning. Interestingly, those data scientists spend more time scrapping, annotating and cleaning data than fine-tuning models. This thesis is motivated by the following question: can we derive a more generic framework than the one of supervised learning in order to learn from clutter data? This question is approached through the lens of weakly supervised learning, assuming that the bottleneck of data collection lies in annotation. We model weak supervision as giving, rather than a unique target, a set of target candidates. We argue that one should look for an “optimistic” function that matches most of the observations. This allows us to derive a principle to disambiguate partial labels. We also discuss the advantage to incorporate unsupervised learning techniques into our framework, in particular manifold regularization approached through diffusion techniques, for which we derived a new algorithm that scales better with input dimension then the baseline method. Finally, we switch from passive to active weakly supervised learning, introducing the “active labeling” framework, in which a practitioner can query weak information about chosen data. Among others, we leverage the fact that one does not need full information to access stochastic gradients and perform stochastic gradient descent.Les mathĂ©matiques appliquĂ©es et le calcul nourrissent beaucoup d’espoirs Ă  la suite des succĂšs rĂ©cents de l’apprentissage supervisĂ©. Dans l’industrie, beaucoup d’ingĂ©nieurs cherchent Ă  remplacer leurs anciens paradigmes de pensĂ©e par l’apprentissage machine. Étonnamment, ces ingĂ©nieurs passent plus de temps Ă  collecter, annoter et nettoyer des donnĂ©es qu’à raffiner des modĂšles. Ce phĂ©nomĂšne motive la problĂ©matique de cette thĂšse: peut-on dĂ©finir un cadre thĂ©orique plus gĂ©nĂ©ral que l’apprentissage supervisĂ© pour apprendre grĂące Ă  des donnĂ©es hĂ©tĂ©rogĂšnes? Cette question est abordĂ©e via le concept de supervision faible, faisant l’hypothĂšse que le problĂšme que posent les donnĂ©es est leur annotation. On modĂ©lise la supervision faible comme l’accĂšs, pour une entrĂ©e donnĂ©e, non pas d’une sortie claire, mais d’un ensemble de sorties potentielles. On plaide pour l’adoption d’une perspective « optimiste » et l’apprentissage d’une fonction qui vĂ©rifie la plupart des observations. Cette perspective nous permet de dĂ©finir un principe pour lever l’ambiguĂŻtĂ© des informations faibles. On discute Ă©galement de l’importance d’incorporer des techniques sans supervision d’apprĂ©hension des donnĂ©es d’entrĂ©e dans notre thĂ©orie, en particulier de comprĂ©hension de la variĂ©tĂ© sous-jacente via des techniques de diffusion, pour lesquelles on propose un algorithme rĂ©aliste afin d’éviter le flĂ©au de la dimension, Ă  l’inverse de ce qui existait jusqu’alors. Enfin, nous nous attaquons Ă  la question de collecte active d’informations faibles, dĂ©finissant le problĂšme de « catalogage en ligne », oĂč un intendant doit acquĂ©rir une maximum d’informations fiables sur ses donnĂ©es sous une contrainte de budget. Entre autres, nous tirons parti du fait que pour obtenir un gradient stochastique et effectuer une descente de gradient, il n’y a pas besoin de supervision totale

    Two studies in resource-efficient inference: structural testing of networks, and selective classification

    Get PDF
    Inference systems suffer costs arising from information acquisition, and from communication and computational costs of executing complex models. This dissertation proposes, in two distinct themes, systems-level methods to reduce these costs without affecting the accuracy of inference by using ancillary low-cost methods to cheaply address most queries, while only using resource-heavy methods on 'difficult' instances. The first theme concerns testing methods in structural inference of networks and graphical models, the proposal being that one first cheaply tests whether the structure underlying a dataset differs from a reference structure, and only estimates the new structure if this difference is large. This study focuses on theoretically establishing separations between the costs of testing and learning to determine when a strategy such as the above has benefits. For two canonical models---the Ising model, and the stochastic block model---fundamental limits are derived on the costs of one- and two-sample goodness-of-fit tests by determining information-theoretic lower bounds, and developing matching tests. A biphasic behaviour in the costs of testing is demonstrated: there is a critical size scale such that detection of differences smaller than this size is nearly as expensive as recovering the structure, while detection of larger differences has vanishing costs relative to recovery. The second theme concerns using Selective classification (SC), or classification with an option to abstain, to control inference-time costs in the machine learning framework. The proposal is to learn a low-complexity selective classifier that only abstains on hard instances, and to execute more expensive methods upon abstention. Herein, a novel SC formulation with a focus on high-accuracy is developed, and used to obtain both theoretical characterisations, and a scheme for learning selective classifiers based on optimising a collection of class-wise decoupled one-sided risks. This scheme attains strong empirical performance, and admits efficient implementation, leading to an effective SC methodology. Finally, SC is studied in the online learning setting with feedback only provided upon abstention, modelling the practical lack of reliable labels without expensive feature collection, and a Pareto-optimal low-error scheme is described

    Search and optimization with randomness in computational economics: equilibria, pricing, and decisions

    Get PDF
    In this thesis we study search and optimization problems from computational economics with primarily stochastic inputs. The results are grouped into two categories: First, we address the smoothed analysis of Nash equilibrium computation. Second, we address two pricing problems in mechanism design, and solve two economically motivated stochastic optimization problems. Computing Nash equilibria is a central question in the game-theoretic study of economic systems of agent interactions. The worst-case analysis of this problem has been studied in depth, but little was known beyond the worst case. We study this problem in the framework of smoothed analysis, where adversarial inputs are randomly perturbed. We show that computing Nash equilibria is hard for 2-player games even when input perturbations are large. This is despite the existence of approximation algorithms in a similar regime. In doing so, our result disproves a conjecture relating approximation schemes to smoothed analysis. Despite the hardness results in general, we also present a special case of co-operative games, where we show that the natural greedy algorithm for finding equilibria has polynomial smoothed complexity. We also develop reductions which preserve smoothed analysis. In the second part of the thesis, we consider optimization problems which are motivated by economic applications. We address two stochastic optimization problems. We begin by developing optimal methods to determine the best among binary classifiers, when the objective function is known only through pairwise comparisons, e.g. when the objective function is the subjective opinion of a client. Finally, we extend known algorithms in the Pandora's box problem --- a classic optimal search problem --- to an order-constrained setting which allows for richer modelling. The remaining chapters address two pricing problems from mechanism design. First, we provide an approximately revenue-optimal pricing scheme for the problem of selling time on a server to jobs whose parameters are sampled i.i.d. from an unknown distribution. We then tackle the problem of fairly dividing chores among a collection of economic agents via a competitive equilibrium, which balances assigned tasks with payouts. We give efficient algorithms to compute such an equilibrium

    Adaptivity in Online and Statistical Learning

    Get PDF
    Many modern machine learning algorithms, though successful, are still based on heuristics. In a typical application, such heuristics may manifest in the choice of a specific Neural Network structure, its number of parameters, or the learning rate during training. Relying on these heuristics is not ideal from a computational perspective (often involving multiple runs of the algorithm), and can also lead to over-fitting in some cases. This motivates the following question: for which machine learning tasks/settings do there exist efficient algorithms that automatically adapt to the best parameters? Characterizing the settings where this is the case and designing corresponding (parameter-free) algorithms within the online learning framework constitutes one of this thesis' primary goals. Towards this end, we develop algorithms for constrained and unconstrained online convex optimization that can automatically adapt to various parameters of interest such as the Lipschitz constant, the curvature of the sequence of losses, and the norm of the comparator. We also derive new performance lower-bounds characterizing the limits of adaptivity for algorithms in these settings. Part of systematizing the choice of machine learning methods also involves having ``certificates'' for the performance of algorithms. In the statistical learning setting, this translates to having (tight) generalization bounds. Adaptivity can manifest here through data-dependent bounds that become small whenever the problem is ``easy''. In this thesis, we provide such data-dependent bounds for the expected loss (the standard risk measure) and other risk measures. We also explore how such bounds can be used in the context of risk-monotonicity

    Stability in Online Learning: From Random Perturbations in Bandit Problems to Differential Privacy

    Full text link
    Online learning is an area of machine learning that studies algorithms that make sequential predictions on data arriving incrementally. In this thesis, we investigate stability of online learning algorithms in two different settings. First, we examine random perturbation methods as a source of stability in bandit problems. Second, we study stability as a key concept connecting online learning and differential privacy. The first two chapters study the statistical properties of the perturbation technique in both stochastic and adversarial multi-armed bandit problems. We provide the first general analysis of perturbations for the stochastic multi-armed bandit problem. We also show that the open problem regarding minimax optimal perturbations for adversarial bandits cannot be solved in two ways that might seem very natural. The next two chapters consider stationary and non-stationary stochastic linear bandits respectively. We develop two randomized exploration strategies: (1) by replacing optimism with a simple randomization when deciding a confidence level in optimism based algorithms, or (2) by directly injecting the random perturbations to current estimates to overcome the conservatism that optimism based algorithms generally suffer from. Furthermore, we study the statistical and computational aspects of both of these strategies. While at a first glance it may seem that online learning and differential privacy have little in common, there is a strong connection between them via the notion of stability since the definition of differential-privacy is at its core, a form of stability. The final chapter investigates whether the recently established equivalence between online and private learnability in binary classification extends to multi-class classification and regression.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169709/1/baekjin_1.pd
    corecore