4 research outputs found

    A Confidence-Based Approach for Balancing Fairness and Accuracy

    Full text link
    We study three classical machine learning algorithms in the context of algorithmic fairness: adaptive boosting, support vector machines, and logistic regression. Our goal is to maintain the high accuracy of these learning algorithms while reducing the degree to which they discriminate against individuals because of their membership in a protected group. Our first contribution is a method for achieving fairness by shifting the decision boundary for the protected group. The method is based on the theory of margins for boosting. Our method performs comparably to or outperforms previous algorithms in the fairness literature in terms of accuracy and low discrimination, while simultaneously allowing for a fast and transparent quantification of the trade-off between bias and error. Our second contribution addresses the shortcomings of the bias-error trade-off studied in most of the algorithmic fairness literature. We demonstrate that even hopelessly naive modifications of a biased algorithm, which cannot be reasonably said to be fair, can still achieve low bias and high accuracy. To help to distinguish between these naive algorithms and more sensible algorithms we propose a new measure of fairness, called resilience to random bias (RRB). We demonstrate that RRB distinguishes well between our naive and sensible fairness algorithms. RRB together with bias and accuracy provides a more complete picture of the fairness of an algorithm

    Algorithms and Complexity Results for Learning and Big Data

    No full text
    This thesis focuses on problems in the theory and practice of machine learning and big data. We will explore the complexity-theoretic properties of MapReduce, one of the most ubiquitous distributed computing frameworks for big data, give new algorithms and prove computational hardness results for a model of clustering, and study fairness in machine learning applications. In our study of MapReduce, we address some of the central questions that computational complexity theory asks about models of computation. After giving a detailed and precise formalization of MapReduce as a model of computation, based on the work of Karloff et al., we compare it to classical Turing machines, and show that languages which can be decided by a Turing machine using sublogarithmic space can also be decided by a constant-round MapReduce computation. In the second half of the chapter, we turn our attention to the question of whether an increased number of rounds or an increased amount of computation time per processor leads to strictly more computational power. We answer this question in the affirmative, proving a hierarchy theorem for MapReduce computations, conditioned on the Exponential Time Hypothesis. We will also study an interactive model of clustering introduced by Balcan and Blum. In this framework of clustering, we give algorithms for clustering linear functionals and hyperplanes, and give computational hardness results that show that other concept classes, including deterministic finite automata, constant-depth threshold circuits, and Boolean formulas, are not possible to efficiently cluster if standard cryptographic assumptions hold. Finally, we address the issue of fairness in machine learning. We propose a novel approach for modifying three popular machine learning algorithms, AdaBoost, logistic regression, and support vector machines, to eliminate bias against a protected group. We empirically compare our method to previous approaches in the literature as well as various baseline algorithms by evaluating them on various real-world datasets, and also give theoretical justification for its performance. We also propose a new measure of fairness for machine learning classifiers, and demonstrate that it can help distinguish between naive and more sophisticated approaches even in the cases when measuring error and bias is not sufficient
    corecore