57 research outputs found
Democratizing machine learning
Modelle des maschinellen Lernens sind zunehmend in der Gesellschaft verankert, oft in Form von automatisierten Entscheidungsprozessen. Ein wesentlicher Grund dafür ist die verbesserte Zugänglichkeit von Daten, aber auch von Toolkits für maschinelles Lernen, die den Zugang zu Methoden des maschinellen Lernens für Nicht-Experten ermöglichen.
Diese Arbeit umfasst mehrere Beiträge zur Demokratisierung des Zugangs zum maschinellem Lernen, mit dem Ziel, einem breiterem Publikum Zugang zu diesen Technologien zu er- möglichen. Die Beiträge in diesem Manuskript stammen aus mehreren Bereichen innerhalb dieses weiten Gebiets. Ein großer Teil ist dem Bereich des automatisierten maschinellen Lernens (AutoML) und der Hyperparameter-Optimierung gewidmet, mit dem Ziel, die oft mühsame Aufgabe, ein optimales Vorhersagemodell für einen gegebenen Datensatz zu finden, zu vereinfachen. Dieser Prozess besteht meist darin ein für vom Benutzer vorgegebene Leistungsmetrik(en) optimales Modell zu finden. Oft kann dieser Prozess durch Lernen aus vorhergehenden Experimenten verbessert oder beschleunigt werden.
In dieser Arbeit werden drei solcher Methoden vorgestellt, die entweder darauf abzielen, eine feste Menge möglicher Hyperparameterkonfigurationen zu erhalten, die wahrscheinlich gute Lösungen für jeden neuen Datensatz enthalten, oder Eigenschaften der Datensätze zu nutzen, um neue Konfigurationen vorzuschlagen.
Darüber hinaus wird eine Sammlung solcher erforderlichen Metadaten zu den Experimenten vorgestellt, und es wird gezeigt, wie solche Metadaten für die Entwicklung und als Testumgebung für neue Hyperparameter- Optimierungsmethoden verwendet werden können. Die weite Verbreitung von ML-Modellen in vielen Bereichen der Gesellschaft erfordert gleichzeitig eine genauere Untersuchung der Art und Weise, wie aus Modellen abgeleitete automatisierte Entscheidungen die Gesellschaft formen, und ob sie möglicherweise Individuen oder einzelne Bevölkerungsgruppen benachteiligen. In dieser Arbeit wird daher ein AutoML-Tool vorgestellt, das es ermöglicht, solche Überlegungen in die Suche nach einem optimalen Modell miteinzubeziehen. Diese Forderung nach Fairness wirft gleichzeitig die Frage auf, ob die Fairness eines Modells zuverlässig geschätzt werden kann, was in einem weiteren Beitrag in dieser Arbeit untersucht wird. Da der Zugang zu Methoden des maschinellen Lernens auch stark vom Zugang zu Software und Toolboxen abhängt, sind mehrere Beiträge in Form von Software Teil dieser Arbeit. Das R-Paket mlr3pipelines ermöglicht die Einbettung von Modellen in sogenan- nte Machine Learning Pipelines, die Vor- und Nachverarbeitungsschritte enthalten, die im maschinellen Lernen und AutoML häufig benötigt werden. Das mlr3fairness R-Paket hingegen ermöglicht es dem Benutzer, Modelle auf potentielle Benachteiligung hin zu über- prüfen und diese durch verschiedene Techniken zu reduzieren. Eine dieser Techniken, multi-calibration wurde darüberhinaus als seperate Software veröffentlicht.Machine learning artifacts are increasingly embedded in society, often in the form of automated decision-making processes. One major reason for this, along with methodological improvements, is the increasing accessibility of data but also machine learning toolkits that enable access to machine learning methodology for non-experts. The core focus of this thesis is exactly this – democratizing access to machine learning in order to enable a wider audience to benefit from its potential.
Contributions in this manuscript stem from several different areas within this broader area. A major section is dedicated to the field of automated machine learning (AutoML) with the goal to abstract away the tedious task of obtaining an optimal predictive model for a given dataset. This process mostly consists of finding said optimal model, often through hyperparameter optimization, while the user in turn only selects the appropriate performance metric(s) and validates the resulting models. This process can be improved or sped up by learning from previous experiments.
Three such methods one with the goal to obtain a fixed set of possible hyperparameter configurations that likely contain good solutions for any new dataset and two using dataset characteristics to propose new configurations are presented in this thesis.
It furthermore presents a collection of required experiment metadata and how such meta-data can be used for the development and as a test bed for new hyperparameter optimization methods. The pervasion of models derived from ML in many aspects of society simultaneously calls for increased scrutiny with respect to how such models shape society and the eventual biases they exhibit. Therefore, this thesis presents an AutoML tool that allows incorporating fairness considerations into the search for an optimal model. This requirement for fairness simultaneously poses the question of whether we can reliably estimate a model’s fairness, which is studied in a further contribution in this thesis. Since access to machine learning methods also heavily depends on access to software and toolboxes, several contributions in the form of software are part of this thesis. The mlr3pipelines R package allows for embedding models in so-called machine learning pipelines that include pre- and postprocessing steps often required in machine learning and AutoML. The mlr3fairness R package on the other hand enables users to audit models for potential biases as well as reduce those biases through different debiasing techniques. One such technique, multi-calibration is published as a separate software package, mcboost
Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features
Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications. We focus on the impact of these techniques on a subsequent algorithm's predictive performance, and-if possible-derive best practices on when to use which technique. We conducted a large-scale benchmark experiment, where we compared different encoding strategies together with five ML algorithms (lasso, random forest, gradient boosting, k-nearest neighbors, support vector machine) using datasets from regression, binary- and multiclass-classification settings. In our study, regularized versions of target encoding (i.e. using target predictions based on the feature levels in the training set as a new numerical feature) consistently provided the best results. Traditionally widely used encodings that make unreasonable assumptions to map levels to integers (e.g. integer encoding) or to reduce the number of levels (possibly based on target information, e.g. leaf encoding) before creating binary indicator variables (one-hot or dummy encoding) were not as effective in comparison
Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness
A vast number of systems across the world use algorithmic decision making
(ADM) to (partially) automate decisions that have previously been made by
humans. When designed well, these systems promise more objective decisions
while saving large amounts of resources and freeing up human time. However,
when ADM systems are not designed well, they can lead to unfair decisions which
discriminate against societal groups. The downstream effects of ADMs critically
depend on the decisions made during the systems' design and implementation, as
biases in data can be mitigated or reinforced along the modeling pipeline. Many
of these design decisions are made implicitly, without knowing exactly how they
will influence the final system. It is therefore important to make explicit the
decisions made during the design of ADM systems and understand how these
decisions affect the fairness of the resulting system.
To study this issue, we draw on insights from the field of psychology and
introduce the method of multiverse analysis for algorithmic fairness. In our
proposed method, we turn implicit design decisions into explicit ones and
demonstrate their fairness implications. By combining decisions, we create a
grid of all possible "universes" of decision combinations. For each of these
universes, we compute metrics of fairness and performance. Using the resulting
dataset, one can see how and which decisions impact fairness. We demonstrate
how multiverse analyses can be used to better understand variability and
robustness of algorithmic fairness using an exemplary case study of predicting
public health coverage of vulnerable populations for potential interventions.
Our results illustrate how decisions during the design of a machine learning
system can have surprising effects on its fairness and how to detect these
effects using multiverse analysis
Learning Multiple Defaults for Machine Learning Algorithms
The performance of modern machine learning methods highly depends on their
hyperparameter configurations. One simple way of selecting a configuration is
to use default settings, often proposed along with the publication and
implementation of a new algorithm. Those default values are usually chosen in
an ad-hoc manner to work good enough on a wide variety of datasets. To address
this problem, different automatic hyperparameter configuration algorithms have
been proposed, which select an optimal configuration per dataset. This
principled approach usually improves performance, but adds additional
algorithmic complexity and computational costs to the training procedure. As an
alternative to this, we propose learning a set of complementary default values
from a large database of prior empirical results. Selecting an appropriate
configuration on a new dataset then requires only a simple, efficient and
embarrassingly parallel search over this set. We demonstrate the effectiveness
and efficiency of the approach we propose in comparison to random search and
Bayesian Optimization
YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization
When developing and analyzing new hyperparameter optimization methods, it is
vital to empirically evaluate and compare them on well-curated benchmark
suites. In this work, we propose a new set of challenging and relevant
benchmark problems motivated by desirable properties and requirements for such
benchmarks. Our new surrogate-based benchmark collection consists of 14
scenarios that in total constitute over 700 multi-fidelity hyperparameter
optimization problems, which all enable multi-objective hyperparameter
optimization. Furthermore, we empirically compare surrogate-based benchmarks to
the more widely-used tabular benchmarks, and demonstrate that the latter may
produce unfaithful results regarding the performance ranking of HPO methods. We
examine and compare our benchmark collection with respect to defined
requirements and propose a single-objective as well as a multi-objective
benchmark suite on which we compare 7 single-objective and 7 multi-objective
optimizers in a benchmark experiment. Our software is available at
[https://github.com/slds-lmu/yahpo_gym].Comment: Accepted at the First Conference on Automated Machine Learning (Main
Track). 39 pages, 12 tables, 10 figures, 1 listin
High Dimensional Restrictive Federated Model Selection with multi-objective Bayesian Optimization over shifted distributions
A novel machine learning optimization process coined Restrictive Federated
Model Selection (RFMS) is proposed under the scenario, for example, when data
from healthcare units can not leave the site it is situated on and it is
forbidden to carry out training algorithms on remote data sites due to either
technical or privacy and trust concerns. To carry out a clinical research under
this scenario, an analyst could train a machine learning model only on local
data site, but it is still possible to execute a statistical query at a certain
cost in the form of sending a machine learning model to some of the remote data
sites and get the performance measures as feedback, maybe due to prediction
being usually much cheaper. Compared to federated learning, which is optimizing
the model parameters directly by carrying out training across all data sites,
RFMS trains model parameters only on one local data site but optimizes
hyper-parameters across other data sites jointly since hyper-parameters play an
important role in machine learning performance. The aim is to get a Pareto
optimal model with respective to both local and remote unseen prediction
losses, which could generalize well across data sites. In this work, we
specifically consider high dimensional data with shifted distributions over
data sites. As an initial investigation, Bayesian Optimization especially
multi-objective Bayesian Optimization is used to guide an adaptive
hyper-parameter optimization process to select models under the RFMS scenario.
Empirical results show that solely using the local data site to tune
hyper-parameters generalizes poorly across data sites, compared to methods that
utilize the local and remote performances. Furthermore, in terms of dominated
hypervolumes, multi-objective Bayesian Optimization algorithms show increased
performance across multiple data sites among other candidates
Mind the Gap: Measuring Generalization Performance Across Multiple Objectives
Modern machine learning models are often constructed taking into account
multiple objectives, e.g., minimizing inference time while also maximizing
accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return
such candidate models, and the approximation of the Pareto front is used to
assess their performance. In practice, we also want to measure generalization
when moving from the validation to the test set. However, some of the models
might no longer be Pareto-optimal which makes it unclear how to quantify the
performance of the MHPO method when evaluated on the test set. To resolve this,
we provide a novel evaluation protocol that allows measuring the generalization
performance of MHPO methods and studying its capabilities for comparing two
optimization experiments
- …