6,775 research outputs found
Reconciling modern machine learning practice and the bias-variance trade-off
Breakthroughs in machine learning are rapidly changing science and society,
yet our fundamental understanding of this technology has lagged far behind.
Indeed, one of the central tenets of the field, the bias-variance trade-off,
appears to be at odds with the observed behavior of methods used in the modern
machine learning practice. The bias-variance trade-off implies that a model
should balance under-fitting and over-fitting: rich enough to express
underlying structure in data, simple enough to avoid fitting spurious patterns.
However, in the modern practice, very rich models such as neural networks are
trained to exactly fit (i.e., interpolate) the data. Classically, such models
would be considered over-fit, and yet they often obtain high accuracy on test
data. This apparent contradiction has raised questions about the mathematical
foundations of machine learning and their relevance to practitioners.
In this paper, we reconcile the classical understanding and the modern
practice within a unified performance curve. This "double descent" curve
subsumes the textbook U-shaped bias-variance trade-off curve by showing how
increasing model capacity beyond the point of interpolation results in improved
performance. We provide evidence for the existence and ubiquity of double
descent for a wide spectrum of models and datasets, and we posit a mechanism
for its emergence. This connection between the performance and the structure of
machine learning models delineates the limits of classical analyses, and has
implications for both the theory and practice of machine learning
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
The bias-variance tradeoff tells us that as model complexity increases, bias
falls and variances increases, leading to a U-shaped test error curve. However,
recent empirical results with over-parameterized neural networks are marked by
a striking absence of the classic U-shaped test error curve: test error keeps
decreasing in wider networks. This suggests that there might not be a
bias-variance tradeoff in neural networks with respect to network width, unlike
was originally claimed by, e.g., Geman et al. (1992). Motivated by the shaky
evidence used to support this claim in neural networks, we measure bias and
variance in the modern setting. We find that both bias and variance can
decrease as the number of parameters grows. To better understand this, we
introduce a new decomposition of the variance to disentangle the effects of
optimization and data sampling. We also provide theoretical analysis in a
simplified setting that is consistent with our empirical findings
{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making
People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting
cito: An R package for training neural networks using torch
Deep Neural Networks (DNN) have become a central method for regression and
classification tasks. Some packages exist that allow to fit DNN directly in R,
but those are rather limited in their functionality. Most current deep learning
applications rely on one of the major deep learning frameworks, in particular
PyTorch or TensorFlow, to build and train DNNs. Using these frameworks,
however, requires substantially more training and time than typical regression
or machine learning functions in the R environment. Here, we present 'cito', a
user-friendly R package for deep learning that allows to specify deep neural
networks in the familiar formula syntax used in many R packages. To fit the
models, 'cito' uses 'torch', taking advantage of the numerically optimized
torch library, including the ability to switch between training models on CPUs
or GPUs. Moreover, 'cito' includes many user-friendly functions for model
plotting and analysis, including optional confidence intervals (CIs) based on
bootstraps on predictions as well as explainable AI (xAI) metrics for effect
sizes and variable importance with CIs and p-values. To showcase a typical
analysis pipeline using 'cito', including its built-in xAI features to explore
the trained DNN, we build a species distribution model of the African elephant.
We hope that by providing a user-friendly R framework to specify, deploy and
interpret deep neural networks, 'cito' will make this interesting model class
more accessible to ecological data analysis. A stable version of 'cito' can be
installed from the comprehensive R archive network (CRAN).Comment: 15 pages, 4 figures, 2 table
- …