305,907 research outputs found
Reconciling modern machine learning practice and the bias-variance trade-off
Breakthroughs in machine learning are rapidly changing science and society,
yet our fundamental understanding of this technology has lagged far behind.
Indeed, one of the central tenets of the field, the bias-variance trade-off,
appears to be at odds with the observed behavior of methods used in the modern
machine learning practice. The bias-variance trade-off implies that a model
should balance under-fitting and over-fitting: rich enough to express
underlying structure in data, simple enough to avoid fitting spurious patterns.
However, in the modern practice, very rich models such as neural networks are
trained to exactly fit (i.e., interpolate) the data. Classically, such models
would be considered over-fit, and yet they often obtain high accuracy on test
data. This apparent contradiction has raised questions about the mathematical
foundations of machine learning and their relevance to practitioners.
In this paper, we reconcile the classical understanding and the modern
practice within a unified performance curve. This "double descent" curve
subsumes the textbook U-shaped bias-variance trade-off curve by showing how
increasing model capacity beyond the point of interpolation results in improved
performance. We provide evidence for the existence and ubiquity of double
descent for a wide spectrum of models and datasets, and we posit a mechanism
for its emergence. This connection between the performance and the structure of
machine learning models delineates the limits of classical analyses, and has
implications for both the theory and practice of machine learning
A survey on bias in machine learning research
Current research on bias in machine learning often focuses on fairness, while
overlooking the roots or causes of bias. However, bias was originally defined
as a "systematic error," often caused by humans at different stages of the
research process. This article aims to bridge the gap between past literature
on bias in research by providing taxonomy for potential sources of bias and
errors in data and models. The paper focus on bias in machine learning
pipelines. Survey analyses over forty potential sources of bias in the machine
learning (ML) pipeline, providing clear examples for each. By understanding the
sources and consequences of bias in machine learning, better methods can be
developed for its detecting and mitigating, leading to fairer, more
transparent, and more accurate ML models.Comment: Submitted to journal. arXiv admin note: substantial text overlap with
arXiv:2308.0946
Machine Learning with Multi-class Regression and Neural Networks: Analysis and Visualization of Crime Data in Seattle
This article examines the implications of machine learning algorithms and models, and the significance of their construction when investigating criminal data. It uses machine learning models and tools to store, clean and analyze data that is fed into a machine learning model. This model is then compared to another model to test for accuracy, biases and patterns that are detected in between the experiments. The data was collected from data.seattle.gov and was published by the City of Seattle Data Portal and was accessed on September 17, 2018. This research will be looking into how machine learning models can be used to generate predictions and how the data management will introduce a bias that is unavoidable. This bias will be discussed, as well as the importance of understanding this bias for sensitive data, such as this crime data
Uncovering Bias: Exploring Machine Learning Techniques for Detecting and Mitigating Bias in Data – A Literature Review
The presence of Bias in models developed using machine learning algorithms has emerged as a critical issue. This literature review explores the topic of uncovering the existence of bias in data and the application of techniques for detecting and mitigating Bias. The review provides a comprehensive analysis of the existing literature, focusing on pre-processing techniques, post-pre-processing techniques, and fairness constraints employed to uncover and address the existence of Bias in machine learning models. The effectiveness, limitations, and trade-offs of these techniques are examined, highlighting their impact on advocating fairness and equity in decision-making processes.
The methodology consists of two key steps: data preparation and bias analysis, followed by machine learning model development and evaluation. In the data preparation phase, the dataset is analyzed for biases and pre-processed using techniques like reweighting or relabeling to reduce bias. In the model development phase, suitable algorithms are selected, and fairness metrics are defined and optimized during the training process. The models are then evaluated using performance and fairness measures and the best-performing model is chosen. The methodology ensures a systematic exploration of machine learning techniques to detect and mitigate bias, leading to more equitable decision-making.
The review begins by examining the techniques of pre-processing, which involve cleaning the data, selecting the features, feature engineering, and sampling. These techniques play an important role in preparing the data to reduce bias and promote fairness in machine learning models. The analysis highlights various studies that have explored the effectiveness of these techniques in uncovering and mitigating bias in data, contributing to the development of more equitable and unbiased machine learning models. Next, the review delves into post-pre-processing techniques that focus on detecting and mitigating bias after the initial data preparation steps. These techniques include bias detection methods that assess the disparate impact or disparate treatment in model predictions, as well as bias mitigation techniques that modify model outputs to achieve fairness across different groups. The evaluation of these techniques, their performance metrics, and potential trade-offs between fairness and accuracy are discussed, providing insights into the challenges and advancements in bias mitigation. Lastly, the review examines fairness constraints, which involve the imposition of rules or guidelines on machine learning algorithms to ensure fairness in predictions or decision-making processes. The analysis explores different fairness constraints, such as demographic parity, equalized odds, and predictive parity, and their effectiveness in reducing bias and advocating fairness in machine learning models. Overall, this literature review provides a comprehensive understanding of the techniques employed to uncover and mitigate the existence of bias in machine learning models. By examining pre-processing techniques, post-pre-processing techniques, and fairness constraints, the review contributes to the development of more fair and unbiased machine learning models, fostering equity and ethical decision-making in various domains. By examining relevant studies, this review provides insights into the effectiveness and limitations of various pre-processing techniques for bias detection and mitigation via Pre-processing, Adversarial learning, Fairness Constraints, and Post-processing techniques
- …