1,561 research outputs found

    Learning Geometric Concepts with Nasty Noise

    Full text link
    We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error O(ϵ)O(\epsilon), where ϵ\epsilon is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest

    Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise

    Full text link
    The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of KK-sparse degree-dd PTFs on Rn\mathbb{R}^n, where any such concept depends only on KK out of nn attributes of the input. Our main contribution is a new algorithm that runs in time (nd/ϵ)O(d)({nd}/{\epsilon})^{O(d)} and under the Gaussian marginal distribution, PAC learns the class up to error rate ϵ\epsilon with O(K4dϵ2dlog5dn)O(\frac{K^{4d}}{\epsilon^{2d}} \cdot \log^{5d} n) samples even when an ηO(ϵd)\eta \leq O(\epsilon^d) fraction of them are corrupted by the nasty noise of Bshouty et al. (2002), possibly the strongest corruption model. Prior to this work, attribute-efficient robust algorithms are established only for the special case of sparse homogeneous halfspaces. Our key ingredients are: 1) a structural result that translates the attribute sparsity to a sparsity pattern of the Chow vector under the basis of Hermite polynomials, and 2) a novel attribute-efficient robust Chow vector estimation algorithm which uses exclusively a restricted Frobenius norm to either certify a good approximation or to validate a sparsity-induced degree-2d2d polynomial as a filter to detect corrupted samples.Comment: ICML 202

    Learning predictive models from massive, semantically disparate data

    Get PDF
    Machine learning approaches offer some of the most successful techniques for constructing predictive models from data. However, applying such techniques in practice requires overcoming several challenges: infeasibility of centralized access to the data because of the massive size of some of the data sets that often exceeds the size of memory available to the learner, distributed nature of data, access restrictions, data fragmentation, semantic disparities between the data sources, and data sources that evolve spatially or temporally (e.g. data streams and genomic data sources in which new data is being submitted continuously). Learning using statistical queries and semantic correspondences that present a unified view of disparate data sources to the learner offer a powerful general framework for addressing some of these challenges. Against this background, this thesis describes (1) approaches to deal with missing values in the statistical query based algorithms for building predictors (Nayve Bayes and decision trees) and the techniques to minimize the number of required queries in such a setting. (2) Sufficient statistics based algorithms for constructing and updating sequence classifiers. (3) Reduction of several aspects of learning from semantically disparate data sources (such as (a) how errors in mappings affect the accuracy of the learned model and (b) how to choose an optimal mapping from among a set of alternative expert-supplied or automatically generated mappings) to the well-studied problems of domain adaptation and learning in presence of noise and (4) a software for learning predictive models from semantically disparate data

    Learning Stochastic Decision Trees

    Get PDF

    Fairness-aware PAC learning from corrupted data

    Get PDF
    Addressing fairness concerns about machine learning models is a crucial step towards their long-term adoption in real-world automated systems. While many approaches have been developed for training fair models from data, little is known about the robustness of these methods to data corruption. In this work we consider fairness-aware learning under worst-case data manipulations. We show that an adversary can in some situations force any learner to return an overly biased classifier, regardless of the sample size and with or without degrading accuracy, and that the strength of the excess bias increases for learning problems with underrepresented protected groups in the data. We also prove that our hardness results are tight up to constant factors. To this end, we study two natural learning algorithms that optimize for both accuracy and fairness and show that these algorithms enjoy guarantees that are order-optimal in terms of the corruption ratio and the protected groups frequencies in the large data limit

    Preface

    Get PDF

    Multi-party Poisoning through Generalized pp-Tampering

    Get PDF
    In a poisoning attack against a learning algorithm, an adversary tampers with a fraction of the training data TT with the goal of increasing the classification error of the constructed hypothesis/model over the final test distribution. In the distributed setting, TT might be gathered gradually from mm data providers P1,,PmP_1,\dots,P_m who generate and submit their shares of TT in an online way. In this work, we initiate a formal study of (k,p)(k,p)-poisoning attacks in which an adversary controls k[n]k\in[n] of the parties, and even for each corrupted party PiP_i, the adversary submits some poisoned data TiT'_i on behalf of PiP_i that is still "(1p)(1-p)-close" to the correct data TiT_i (e.g., 1p1-p fraction of TiT'_i is still honestly generated). For k=mk=m, this model becomes the traditional notion of poisoning, and for p=1p=1 it coincides with the standard notion of corruption in multi-party computation. We prove that if there is an initial constant error for the generated hypothesis hh, there is always a (k,p)(k,p)-poisoning attacker who can decrease the confidence of hh (to have a small error), or alternatively increase the error of hh, by Ω(pk/m)\Omega(p \cdot k/m). Our attacks can be implemented in polynomial time given samples from the correct data, and they use no wrong labels if the original distributions are not noisy. At a technical level, we prove a general lemma about biasing bounded functions f(x1,,xn)[0,1]f(x_1,\dots,x_n)\in[0,1] through an attack model in which each block xix_i might be controlled by an adversary with marginal probability pp in an online way. When the probabilities are independent, this coincides with the model of pp-tampering attacks, thus we call our model generalized pp-tampering. We prove the power of such attacks by incorporating ideas from the context of coin-flipping attacks into the pp-tampering model and generalize the results in both of these areas
    corecore