80,027 research outputs found

    Exploring Discrimination: A User-centric Evaluation of Discrimination-Aware Data Mining

    Full text link

    Discrimination and Class Imbalance Aware Online Naive Bayes

    Full text link
    Fairness-aware mining of massive data streams is a growing and challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans at critical decision-making points e.g., hiring staff, assessing credit risk, etc. This calls for handling massive incoming information with minimum response delay while ensuring fair and high quality decisions. Recent discrimination-aware learning methods are optimized based on overall accuracy. However, the overall accuracy is biased in favor of the majority class; therefore, state-of-the-art methods mainly diminish discrimination by partially or completely ignoring the minority class. In this context, we propose a novel adaptation of Na\"ive Bayes to mitigate discrimination embedded in the streams while maintaining high predictive performance for both the majority and minority classes. Our proposed algorithm is simple, fast, and attains multi-objective optimization goals. To handle class imbalance and concept drifts, a dynamic instance weighting module is proposed, which gives more importance to recent instances and less importance to obsolete instances based on their membership in minority or majority class. We conducted experiments on a range of streaming and static datasets and deduced that our proposed methodology outperforms existing state-of-the-art fairness-aware methods in terms of both discrimination score and balanced accuracy

    Discrimination-aware classification

    Get PDF
    Classifier construction is one of the most researched topics within the data mining and machine learning communities. Literally thousands of algorithms have been proposed. The quality of the learned models, however, depends critically on the quality of the training data. No matter which classifier inducer is applied, if the training data is incorrect, poor models will result. In this thesis, we study cases in which the input data is discriminatory and we are supposed to learn a classifier that optimizes accuracy, but does not discriminate in its predictions. Such situations occur naturally as artifacts of the data collection process when the training data is collected from different sources with different labeling criteria, when the data is generated by a biased decision process, or when the sensitive attribute, e.g., gender serves as a proxy for unobserved features. In many situations, a classifier that detects and uses the racial or gender discrimination is undesirable for legal reasons. The concept of discrimination is illustrated by the next example: Throughout the years, an employment bureau recorded various parameters of job candidates. Based on these parameters, the company wants to learn a model for partially automating the matchmaking between a job and a job candidate. A match is labeled as successful if the company hires the applicant. It turns out, however, that the historical data is biased; for higher board functions, Caucasian males are systematically being favored. A model learned directly on this data will learn this discriminatory behavior and apply it over future predictions. From an ethical and legal point of view it is of course unacceptable that a model discriminating in this way is deployed. Our proposed solutions to the discrimination problem fall into two broad categories. First, we propose pre-processing methods to remove the discrimination from the training dataset. Second, we propose solutions to the discrimination problem by directly pushing the non-discrimination constraints into classification models and post-processing of built models. We further studied the discrimination-aware classification paradigm in the presence of explanatory attributes that were correlated with the sensitive attribute, e.g., low income may be explained by the low education level. In such a case, as we show, not all discrimination can be considered bad. Therefore, we introduce a new way of measuring discrimination, by explicitly splitting it up into explainable and bad discrimination and propose methods to remove the bad discrimination only. We tried our discrimination-aware methods over real world data sets. We observed in our experiments that our methods show promising results and clearly outperform the traditional classification model w.r.t. accuracy discrimination trade-off. To conclude, we believe that discrimination-aware classification is a new and exciting area of research addressing a societally relevant problem

    FAC-fed: Federated adaptation for fairness and concept drift aware stream classification

    Get PDF
    Federated learning is an emerging collaborative learning paradigm of Machine learning involving distributed and heterogeneous clients. Enormous collections of continuously arriving heterogeneous data residing on distributed clients require federated adaptation of efficient mining algorithms to enable fair and high-quality predictions with privacy guarantees and minimal response delay. In this context, we propose a federated adaptation that mitigates discrimination embedded in the streaming data while handling concept drifts (FAC-Fed). We present a novel adaptive data augmentation method that mitigates client-side discrimination embedded in the data during optimization, resulting in an optimized and fair centralized server. Extensive experiments on a set of publicly available streaming and static datasets confirm the effectiveness of the proposed method. To the best of our knowledge, this work is the first attempt towards fairness-aware federated adaptation for stream classification, therefore, to prove the superiority of our proposed method over state-of-the-art, we compare the centralized version of our proposed method with three centralized stream classification baseline models (FABBOO, FAHT, CSMOTE). The experimental results show that our method outperforms the current methods in terms of both discrimination mitigation and predictive performance

    Is it ethical to avoid error analysis?

    Full text link
    Machine learning algorithms tend to create more accurate models with the availability of large datasets. In some cases, highly accurate models can hide the presence of bias in the data. There are several studies published that tackle the development of discriminatory-aware machine learning algorithms. We center on the further evaluation of machine learning models by doing error analysis, to understand under what conditions the model is not working as expected. We focus on the ethical implications of avoiding error analysis, from a falsification of results and discrimination perspective. Finally, we show different ways to approach error analysis in non-interpretable machine learning algorithms such as deep learning.Comment: Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017
    • …
    corecore