7 research outputs found

    Transitions, Losses, and Re-parameterizations: Elements of Prediction Games

    No full text
    This thesis presents some geometric insights into three different types of two-player prediction games – namely general learning task, prediction with expert advice, and online convex optimization. These games differ in the nature of the opponent (stochastic, adversarial, or intermediate), the order of the players' move, and the utility function. The insights shed some light on the understanding of the intrinsic barriers of the prediction problems and the design of computationally efficient learning algorithms with strong theoretical guarantees (such as generalizability, statistical consistency, and constant regret etc.). The main contributions of the thesis are: • Leveraging concepts from statistical decision theory, we develop a necessary toolkit for formalizing the prediction games mentioned above and quantifying the objective of them. • We investigate the cost-sensitive classification problem which is an instantiation of the general learning task, and demonstrate the hardness of this problem by producing the lower bounds on the minimax risk of it. Then we analyse the impact of imposing constraints (such as corruption level, and privacy requirements etc.) on the general learning task. This naturally leads us to further investigation of strong data processing inequalities which is a fundamental concept in information theory. Furthermore, by extending the hypothesis testing interpretation of standard privacy definitions, we propose an asymmetric (prioritized) privacy definition. • We study efficient merging schemes for prediction with expert advice problem and the geometric properties (mixability and exp-concavity) of the loss functions that guarantee constant regret bounds. As a result of our study, we construct two types of link functions (one using calculus approach and another using geometric approach) that can re-parameterize any binary mixable loss into an exp-concave loss. • We focus on some recent algorithms for online convex optimization, which exploit the easy nature of the data (such as sparsity, predictable sequences, and curved losses) in order to achieve better regret bound while ensuring the protection against the worst case scenario. We unify some of these existing techniques to obtain new update rules for the cases when these easy instances occur together, and analyse the regret bounds of them

    Fast rates in statistical and online learning

    Get PDF
    The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

    Composite multiclass losses

    Get PDF
    We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We subsume results on “classification calibration” by relating it to properness. We determine the stationarity condition, Bregman representation, order-sensitivity, and quasi-convexity of multiclass proper losses. We then characterise the existence and uniqueness of the composite representation formulti class losses. We show how the composite representation is related to other core properties of a loss: mixability, admissibility and (strong) convexity of multiclass losses which we characterise in terms of the Hessian of the Bayes risk. We show that the simple integral representation for binary proper losses can not be extended to multiclass losses but offer concrete guidance regarding how to design different loss functions. The conclusion drawn from these results is that the proper composite representation is a natural and convenient tool for the design of multiclass loss functions

    Exp-Concavity of Proper Composite Losses

    No full text
    The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and O (T √ ) O(T) and O(logT) O(log⁡T) regret bounds can be achieved for convex losses and strictly convex losses with bounded first and second derivatives respectively. In special cases like the Aggregating Algorithm with mixable losses and the Weighted Average Algorithm with exp-concave losses, it is possible to achieve O(1) O(1) regret bounds. But mixability and exp -concavity are roughly equivalent under certain conditions. Thus by understanding the underlying relationship between these two notions we can gain the best of both algorithms (strong theoretical performance guarantees of the Aggregating Algorithm and the computational efficiency of the Weighted Average Algorithm). In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. Using this characterization and the mixability condition of proper losses, we show that it is possible to transform (re-parameterize) any β β -mixable binary proper loss into a β β-exp-concave composite loss with the same β β. In the multi-class case, we propose an approximation approach for this transformatio
    corecore