305,907 research outputs found

    Reconciling modern machine learning practice and the bias-variance trade-off

    Full text link
    Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double descent" curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning

    A survey on bias in machine learning research

    Full text link
    Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias. However, bias was originally defined as a "systematic error," often caused by humans at different stages of the research process. This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models. The paper focus on bias in machine learning pipelines. Survey analyses over forty potential sources of bias in the machine learning (ML) pipeline, providing clear examples for each. By understanding the sources and consequences of bias in machine learning, better methods can be developed for its detecting and mitigating, leading to fairer, more transparent, and more accurate ML models.Comment: Submitted to journal. arXiv admin note: substantial text overlap with arXiv:2308.0946

    Machine Learning with Multi-class Regression and Neural Networks: Analysis and Visualization of Crime Data in Seattle

    Get PDF
    This article examines the implications of machine learning algorithms and models, and the significance of their construction when investigating criminal data. It uses machine learning models and tools to store, clean and analyze data that is fed into a machine learning model. This model is then compared to another model to test for accuracy, biases and patterns that are detected in between the experiments. The data was collected from data.seattle.gov and was published by the City of Seattle Data Portal and was accessed on September 17, 2018. This research will be looking into how machine learning models can be used to generate predictions and how the data management will introduce a bias that is unavoidable. This bias will be discussed, as well as the importance of understanding this bias for sensitive data, such as this crime data

    Uncovering Bias: Exploring Machine Learning Techniques for Detecting and Mitigating Bias in Data – A Literature Review

    Get PDF
    The presence of Bias in models developed using machine learning algorithms has emerged as a critical issue. This literature review explores the topic of uncovering the existence of bias in data and the application of techniques for detecting and mitigating Bias. The review provides a comprehensive analysis of the existing literature, focusing on pre-processing techniques, post-pre-processing techniques, and fairness constraints employed to uncover and address the existence of Bias in machine learning models. The effectiveness, limitations, and trade-offs of these techniques are examined, highlighting their impact on advocating fairness and equity in decision-making processes. The methodology consists of two key steps: data preparation and bias analysis, followed by machine learning model development and evaluation. In the data preparation phase, the dataset is analyzed for biases and pre-processed using techniques like reweighting or relabeling to reduce bias. In the model development phase, suitable algorithms are selected, and fairness metrics are defined and optimized during the training process. The models are then evaluated using performance and fairness measures and the best-performing model is chosen. The methodology ensures a systematic exploration of machine learning techniques to detect and mitigate bias, leading to more equitable decision-making. The review begins by examining the techniques of pre-processing, which involve cleaning the data, selecting the features, feature engineering, and sampling. These techniques play an important role in preparing the data to reduce bias and promote fairness in machine learning models. The analysis highlights various studies that have explored the effectiveness of these techniques in uncovering and mitigating bias in data, contributing to the development of more equitable and unbiased machine learning models. Next, the review delves into post-pre-processing techniques that focus on detecting and mitigating bias after the initial data preparation steps. These techniques include bias detection methods that assess the disparate impact or disparate treatment in model predictions, as well as bias mitigation techniques that modify model outputs to achieve fairness across different groups. The evaluation of these techniques, their performance metrics, and potential trade-offs between fairness and accuracy are discussed, providing insights into the challenges and advancements in bias mitigation. Lastly, the review examines fairness constraints, which involve the imposition of rules or guidelines on machine learning algorithms to ensure fairness in predictions or decision-making processes. The analysis explores different fairness constraints, such as demographic parity, equalized odds, and predictive parity, and their effectiveness in reducing bias and advocating fairness in machine learning models. Overall, this literature review provides a comprehensive understanding of the techniques employed to uncover and mitigate the existence of bias in machine learning models. By examining pre-processing techniques, post-pre-processing techniques, and fairness constraints, the review contributes to the development of more fair and unbiased machine learning models, fostering equity and ethical decision-making in various domains. By examining relevant studies, this review provides insights into the effectiveness and limitations of various pre-processing techniques for bias detection and mitigation via Pre-processing, Adversarial learning, Fairness Constraints, and Post-processing techniques
    • …
    corecore