23 research outputs found
OpenBox: A Python Toolkit for Generalized Black-box Optimization
Black-box optimization (BBO) has a broad range of applications, including
automatic machine learning, experimental design, and database knob tuning.
However, users still face challenges when applying BBO methods to their
problems at hand with existing software packages in terms of applicability,
performance, and efficiency. This paper presents OpenBox, an open-source BBO
toolkit with improved usability. It implements user-friendly inferfaces and
visualization for users to define and manage their tasks. The modular design
behind OpenBox facilitates its flexible deployment in existing systems.
Experimental results demonstrate the effectiveness and efficiency of OpenBox
over existing systems. The source code of OpenBox is available at
https://github.com/PKU-DAIR/open-box
FairPilot: An Explorative System for Hyperparameter Tuning through the Lens of Fairness
Despite the potential benefits of machine learning (ML) in high-risk
decision-making domains, the deployment of ML is not accessible to
practitioners, and there is a risk of discrimination. To establish trust and
acceptance of ML in such domains, democratizing ML tools and fairness
consideration are crucial. In this paper, we introduce FairPilot, an
interactive system designed to promote the responsible development of ML models
by exploring a combination of various models, different hyperparameters, and a
wide range of fairness definitions. We emphasize the challenge of selecting the
``best" ML model and demonstrate how FairPilot allows users to select a set of
evaluation criteria and then displays the Pareto frontier of models and
hyperparameters as an interactive map. FairPilot is the first system to combine
these features, offering a unique opportunity for users to responsibly choose
their model
Fairer and More Accurate Tabular Models Through NAS
Making models algorithmically fairer in tabular data has been long studied,
with techniques typically oriented towards fixes which usually take a neural
model with an undesirable outcome and make changes to how the data are
ingested, what the model weights are, or how outputs are processed. We employ
an emergent and different strategy where we consider updating the model's
architecture and training hyperparameters to find an entirely new model with
better outcomes from the beginning of the debiasing procedure. In this work, we
propose using multi-objective Neural Architecture Search (NAS) and
Hyperparameter Optimization (HPO) in the first application to the very
challenging domain of tabular data. We conduct extensive exploration of
architectural and hyperparameter spaces (MLP, ResNet, and FT-Transformer)
across diverse datasets, demonstrating the dependence of accuracy and fairness
metrics of model predictions on hyperparameter combinations. We show that
models optimized solely for accuracy with NAS often fail to inherently address
fairness concerns. We propose a novel approach that jointly optimizes
architectural and training hyperparameters in a multi-objective constraint of
both accuracy and fairness. We produce architectures that consistently Pareto
dominate state-of-the-art bias mitigation methods either in fairness, accuracy
or both, all of this while being Pareto-optimal over hyperparameters achieved
through single-objective (accuracy) optimization runs. This research
underscores the promise of automating fairness and accuracy optimization in
deep learning models
An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification
Deep learning models have proven to be highly successful. Yet, their
over-parameterization gives rise to model multiplicity, a phenomenon in which
multiple models achieve similar performance but exhibit distinct underlying
behaviours. This multiplicity presents a significant challenge and necessitates
additional specifications in model selection to prevent unexpected failures
during deployment. While prior studies have examined these concerns, they focus
on individual metrics in isolation, making it difficult to obtain a
comprehensive view of multiplicity in trustworthy machine learning. Our work
stands out by offering a one-stop empirical benchmark of multiplicity across
various dimensions of model design and its impact on a diverse set of
trustworthy metrics. In this work, we establish a consistent language for
studying model multiplicity by translating several trustworthy metrics into
accuracy under appropriate interventions. We also develop a framework, which we
call multiplicity sheets, to benchmark multiplicity in various scenarios. We
demonstrate the advantages of our setup through a case study in image
classification and provide actionable insights into the impact and trends of
different hyperparameters on model multiplicity. Finally, we show that
multiplicity persists in deep learning models even after enforcing additional
specifications during model selection, highlighting the severity of
over-parameterization. The concerns of under-specification thus remain, and we
seek to promote a more comprehensive discussion of multiplicity in trustworthy
machine learning.Comment: Accepted at WACV 202
FairAutoML: Embracing Unfairness Mitigation in AutoML
In this work, we propose an Automated Machine Learning (AutoML) system to
search for models not only with good prediction accuracy but also fair. We
first investigate the necessity and impact of unfairness mitigation in the
AutoML context. We establish the FairAutoML framework. The framework provides a
novel design based on pragmatic abstractions, which makes it convenient to
incorporate existing fairness definitions, unfairness mitigation techniques,
and hyperparameter search methods into the model search and evaluation process.
Following this framework, we develop a fair AutoML system based on an existing
AutoML system. The augmented system includes a resource allocation strategy to
dynamically decide when and on which models to conduct unfairness mitigation
according to the prediction accuracy, fairness, and resource consumption on the
fly. Extensive empirical evaluation shows that our system can achieve a good
`fair accuracy' and high resource efficiency.Comment: 18 pages (including 6 pages of appendixes
Linking convolutional kernel size to generalization bias in face analysis CNNs
Training dataset biases are by far the most scrutinized factors when
explaining algorithmic biases of neural networks. In contrast, hyperparameters
related to the neural network architecture have largely been ignored even
though different network parameterizations are known to induce different
implicit biases over learned features. For example, convolutional kernel size
is known to affect the frequency content of features learned in CNNs. In this
work, we present a causal framework for linking an architectural hyperparameter
to out-of-distribution algorithmic bias. Our framework is experimental, in that
we train several versions of a network with an intervention to a specific
hyperparameter, and measure the resulting causal effect of this choice on
performance bias when a particular out-of-distribution image perturbation is
applied. In our experiments, we focused on measuring the causal relationship
between convolutional kernel size and face analysis classification bias across
different subpopulations (race/gender), with respect to high-frequency image
details. We show that modifying kernel size, even in one layer of a CNN,
changes the frequency content of learned features significantly across data
subgroups leading to biased generalization performance even in the presence of
a balanced dataset.Comment: WACV 202
Promoting Fairness through Hyperparameter Optimization
Considerable research effort has been guided towards algorithmic fairness but
real-world adoption of bias reduction techniques is still scarce. Existing
methods are either metric- or model-specific, require access to sensitive
attributes at inference time, or carry high development or deployment costs.
This work explores the unfairness that emerges when optimizing ML models solely
for predictive performance, and how to mitigate it with a simple and easily
deployed intervention: fairness-aware hyperparameter optimization (HO). We
propose and evaluate fairness-aware variants of three popular HO algorithms:
Fair Random Search, Fair TPE, and Fairband. We validate our approach on a
real-world bank account opening fraud case-study, as well as on three datasets
from the fairness literature. Results show that, without extra training cost,
it is feasible to find models with 111% mean fairness increase and just 6%
decrease in performance when compared with fairness-blind HO.Comment: arXiv admin note: substantial text overlap with arXiv:2010.0366