Fairness in software systems aims to provide algorithms that operate in a nondiscriminatory manner, with respect to protected attributes such as gender, race,
or age. Ensuring fairness is a crucial non-functional property of data-driven Machine Learning systems. Several approaches (i.e., bias mitigation methods) have
been proposed in the literature to reduce bias of Machine Learning systems. However, this often comes hand in hand with performance deterioration. Therefore, this
thesis addresses trade-offs that practitioners face when debiasing Machine Learning
systems.
At first, we perform a literature review to investigate the current state of the
art for debiasing Machine Learning systems. This includes an overview of existing
debiasing techniques and how they are evaluated (e.g., how is bias measured).
As a second contribution, we propose a benchmarking approach that allows for
an evaluation and comparison of bias mitigation methods and their trade-offs (i.e.,
how much performance is sacrificed for improving fairness).
Afterwards, we propose a debiasing method ourselves, which modifies already
trained Machine Learning models, with the goal to improve both, their fairness and
accuracy.
Moreover, this thesis addresses the challenge of how to deal with fairness with
regards to age. This question is answered with an empirical evaluation on real-world
datasets