723 research outputs found

    A Two-stage Classification Method for High-dimensional Data and Point Clouds

    Full text link
    High-dimensional data classification is a fundamental task in machine learning and imaging science. In this paper, we propose a two-stage multiphase semi-supervised classification method for classifying high-dimensional data and unstructured point clouds. To begin with, a fuzzy classification method such as the standard support vector machine is used to generate a warm initialization. We then apply a two-stage approach named SaT (smoothing and thresholding) to improve the classification. In the first stage, an unconstraint convex variational model is implemented to purify and smooth the initialization, followed by the second stage which is to project the smoothed partition obtained at stage one to a binary partition. These two stages can be repeated, with the latest result as a new initialization, to keep improving the classification quality. We show that the convex model of the smoothing stage has a unique solution and can be solved by a specifically designed primal-dual algorithm whose convergence is guaranteed. We test our method and compare it with the state-of-the-art methods on several benchmark data sets. The experimental results demonstrate clearly that our method is superior in both the classification accuracy and computation speed for high-dimensional data and point clouds.Comment: 21 pages, 4 figure

    Dimensionality reduction of clustered data sets

    Get PDF
    We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets

    Deep Learning is Provably Robust to Symmetric Label Noise

    Full text link
    Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mitigation? We provide an affirmative answer for the case of symmetric label noise: We find that certain DNNs, including under-parameterized and over-parameterized models, can tolerate massive symmetric label noise up to the information-theoretic threshold. By appealing to classical statistical theory and universal consistency of DNNs, we prove that for multiclass classification, L1L_1-consistent DNN classifiers trained under symmetric label noise can achieve Bayes optimality asymptotically if the label noise probability is less than K−1K\frac{K-1}{K}, where K≥2K \ge 2 is the number of classes. Our results show that for symmetric label noise, no mitigation is necessary for L1L_1-consistent estimators. We conjecture that for general label noise, mitigation strategies that make use of the noisy data will outperform those that ignore the noisy data

    Tracking the risk of a deployed model and detecting harmful distribution shifts

    Full text link
    When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model retraining) unnecessary. While several works have developed tests for distribution shifts, these typically either use non-sequential methods, or detect arbitrary shifts (benign or harmful), or both. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate. In this work, we design simple sequential tools for testing if the difference between source (training) and target (test) distributions leads to a significant increase in a risk function of interest, like accuracy or calibration. Recent advances in constructing time-uniform confidence sequences allow efficient aggregation of statistical evidence accumulated during the tracking process. The designed framework is applicable in settings where (some) true labels are revealed after the prediction is performed, or when batches of labels become available in a delayed fashion. We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets.Comment: Accepted as a conference paper at ICLR 202

    One-Bit Quantization and Sparsification for Multiclass Linear Classification via Regularized Regression

    Full text link
    We study the use of linear regression for multiclass classification in the over-parametrized regime where some of the training data is mislabeled. In such scenarios it is necessary to add an explicit regularization term, λf(w)\lambda f(w), for some convex function f(⋅)f(\cdot), to avoid overfitting the mislabeled data. In our analysis, we assume that the data is sampled from a Gaussian Mixture Model with equal class sizes, and that a proportion cc of the training labels is corrupted for each class. Under these assumptions, we prove that the best classification performance is achieved when f(⋅)=∥⋅∥22f(\cdot) = \|\cdot\|^2_2 and λ→∞\lambda \to \infty. We then proceed to analyze the classification errors for f(⋅)=∥⋅∥1f(\cdot) = \|\cdot\|_1 and f(⋅)=∥⋅∥∞f(\cdot) = \|\cdot\|_\infty in the large λ\lambda regime and notice that it is often possible to find sparse and one-bit solutions, respectively, that perform almost as well as the one corresponding to f(⋅)=∥⋅∥22f(\cdot) = \|\cdot\|_2^2

    Confusion-Based Online Learning and a Passive-Aggressive Scheme

    No full text
    International audienceThis paper provides the first ---to the best of our knowledge--- analysis of online learning algorithms for multiclass problems when the {\em confusion} matrix is taken as a performance measure. The work builds upon recent and elegant results on noncommutative concentration inequalities, i.e. concentration inequalities that apply to matrices, and, more precisely, to matrix martingales. We do establish generalization bounds for online learning algorithms and show how the theoretical study motivates the proposition of a new confusion-friendly learning procedure. This learning algorithm, called \copa (for COnfusion Passive-Aggressive) is a passive-aggressive learning algorithm; it is shown that the update equations for \copa can be computed analytically and, henceforth, there is no need to recourse to any optimization package to implement it
    • …
    corecore