12 research outputs found

    Learning with Symmetric Label Noise: The Importance of Being Unhinged

    Get PDF
    Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2008] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust. The loss avoids the Long and Servedio [2008] result by virtue of being negatively unbounded. The loss is a modification of the hinge loss, where one does not clamp at zero; hence, we call it the unhinged loss. We show that the optimal unhinged solution is equivalent to that of a strongly regularised SVM, and is the limiting solution for any convex potential; this implies that strong l2 regularisation makes most standard learners SLN-robust. Experiments confirm the unhinged loss’ SLN-robustnes

    Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

    Get PDF
    Stochastic convex optimization, by which the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research, and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases of interest, we derive nearly matching upper and lower bounds on the estimation (sample) complexity, including linear optimization in the most general setting. We then present several consequences for machine learning, differential privacy, and proving concrete lower bounds on the power of convex optimization–based methods. The key ingredient of our work is SQ algorithms and lower bounds for estimating the mean vector of a distribution over vectors supported on a convex body in Rd. This natural problem has not been previously studied, and we show that our solutions can be used to get substantially improved SQ versions of Perceptron and other online algorithms for learning halfspaces

    Learning with online constraints : shifting concepts and active learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 99-102).Many practical problems such as forecasting, real-time decision making, streaming data applications, and resource-constrained learning, can be modeled as learning with online constraints. This thesis is concerned with analyzing and designing algorithms for learning under the following online constraints: i) The algorithm has only sequential, or one-at-time, access to data. ii) The time and space complexity of the algorithm must not scale with the number of observations. We analyze learning with online constraints in a variety of settings, including active learning. The active learning model is applicable to any domain in which unlabeled data is easy to come by and there exists a (potentially difficult or expensive) mechanism by which to attain labels. First, we analyze a supervised learning framework in which no statistical assumptions are made about the sequence of observations, and algorithms are evaluated based on their regret, i.e. their relative prediction loss with respect to the hindsight-optimal algorithm in a comparator class. We derive a, lower bound on regret for a class of online learning algorithms designed to track shifting concepts in this framework. We apply an algorithm we provided in previous work, that avoids this lower bound, to an energy-management problem in wireless networks, and demonstrate this application in a network simulation.(cont.) Second, we analyze a supervised learning framework in which the observations are assumed to be iid, and algorithms are compared by the number of prediction mistakes made in reaching a target generalization error. We provide a lower bound on mistakes for Perceptron, a standard online learning algorithm, for this framework. We introduce a modification to Perceptron and show that it avoids this lower bound, and in fact attains the optimal mistake-complexity for this setting. Third, we motivate and analyze an online active learning framework. The observations are assumed to be iid, and algorithms are judged by the number of label queries to reach a target generalization error. Our lower bound applies to the active learning setting as well, as a lower bound on labels for Perceptron paired with any active learning rule. We provide a new online active learning algorithm that avoids the lower bound, and we upper bound its label-complexity. The upper bound is optimal and also bounds the algorithm's total errors (labeled and unlabeled). We analyze the algorithm further, yielding a label-complexity bound under relaxed assumptions. Using optical character recognition data, we empirically compare the new algorithm to an online active learning algorithm with data-dependent performance guarantees, as well as to the combined variants of these two algorithms.by Claire E. Monteleoni.Ph.D

    Learning with Online Constraints: Shifting Concepts and Active Learning

    Get PDF
    PhD thesisMany practical problems such as forecasting, real-time decisionmaking, streaming data applications, and resource-constrainedlearning, can be modeled as learning with online constraints. Thisthesis is concerned with analyzing and designing algorithms forlearning under the following online constraints: 1) The algorithm hasonly sequential, or one-at-time, access to data. 2) The time andspace complexity of the algorithm must not scale with the number ofobservations. We analyze learning with online constraints in avariety of settings, including active learning. The active learningmodel is applicable to any domain in which unlabeled data is easy tocome by and there exists a (potentially difficult or expensive)mechanism by which to attain labels.First, we analyze a supervised learning framework in which nostatistical assumptions are made about the sequence of observations,and algorithms are evaluated based on their regret, i.e. theirrelative prediction loss with respect to the hindsight-optimalalgorithm in a comparator class. We derive a lower bound on regretfor a class of online learning algorithms designed to track shiftingconcepts in this framework. We apply an algorithm we provided inprevious work, that avoids this lower bound, to an energy-managementproblem in wireless networks, and demonstrate this application in anetwork simulation. Second, we analyze a supervised learning frameworkin which the observations are assumed to be iid, and algorithms arecompared by the number of prediction mistakes made in reaching atarget generalization error. We provide a lower bound on mistakes forPerceptron, a standard online learning algorithm, for this framework.We introduce a modification to Perceptron and show that it avoids thislower bound, and in fact attains the optimal mistake-complexity forthis setting.Third, we motivate and analyze an online active learning framework.The observations are assumed to be iid, and algorithms are judged bythe number of label queries to reach a target generalizationerror. Our lower bound applies to the active learning setting as well,as a lower bound on labels for Perceptron paired with any activelearning rule. We provide a new online active learning algorithm thatavoids the lower bound, and we upper bound its label-complexity. Theupper bound is optimal and also bounds the algorithm's total errors(labeled and unlabeled). We analyze the algorithm further, yielding alabel-complexity bound under relaxed assumptions. Using opticalcharacter recognition data, we empirically compare the new algorithmto an online active learning algorithm with data-dependent performanceguarantees, as well as to the combined variants of these twoalgorithms
    corecore