7 research outputs found
What is the state of the art? Accounting for multiplicity in machine learning benchmark performance
Machine learning methods are commonly evaluated and compared by their
performance on data sets from public repositories. This allows for multiple
methods, oftentimes several thousands, to be evaluated under identical
conditions and across time. The highest ranked performance on a problem is
referred to as state-of-the-art (SOTA) performance, and is used, among other
things, as a reference point for publication of new methods. Using the
highest-ranked performance as an estimate for SOTA is a biased estimator,
giving overly optimistic results. The mechanisms at play are those of
multiplicity, a topic that is well-studied in the context of multiple
comparisons and multiple testing, but has, as far as the authors are aware of,
been nearly absent from the discussion regarding SOTA estimates. The optimistic
state-of-the-art estimate is used as a standard for evaluating new methods, and
methods with substantial inferior results are easily overlooked. In this
article, we provide a probability distribution for the case of multiple
classifiers so that known analyses methods can be engaged and a better SOTA
estimate can be provided. We demonstrate the impact of multiplicity through a
simulated example with independent classifiers. We show how classifier
dependency impacts the variance, but also that the impact is limited when the
accuracy is high. Finally, we discuss a real-world example; a Kaggle
competition from 2020
High Dimensional Restrictive Federated Model Selection with multi-objective Bayesian Optimization over shifted distributions
A novel machine learning optimization process coined Restrictive Federated
Model Selection (RFMS) is proposed under the scenario, for example, when data
from healthcare units can not leave the site it is situated on and it is
forbidden to carry out training algorithms on remote data sites due to either
technical or privacy and trust concerns. To carry out a clinical research under
this scenario, an analyst could train a machine learning model only on local
data site, but it is still possible to execute a statistical query at a certain
cost in the form of sending a machine learning model to some of the remote data
sites and get the performance measures as feedback, maybe due to prediction
being usually much cheaper. Compared to federated learning, which is optimizing
the model parameters directly by carrying out training across all data sites,
RFMS trains model parameters only on one local data site but optimizes
hyper-parameters across other data sites jointly since hyper-parameters play an
important role in machine learning performance. The aim is to get a Pareto
optimal model with respective to both local and remote unseen prediction
losses, which could generalize well across data sites. In this work, we
specifically consider high dimensional data with shifted distributions over
data sites. As an initial investigation, Bayesian Optimization especially
multi-objective Bayesian Optimization is used to guide an adaptive
hyper-parameter optimization process to select models under the RFMS scenario.
Empirical results show that solely using the local data site to tune
hyper-parameters generalizes poorly across data sites, compared to methods that
utilize the local and remote performances. Furthermore, in terms of dominated
hypervolumes, multi-objective Bayesian Optimization algorithms show increased
performance across multiple data sites among other candidates
A governance framework for algorithmic accountability and transparency
Algorithmic systems are increasingly being used as part of decision-making processes in both the public and private sectors, with potentially significant consequences for individuals, organisations and societies as a whole. Algorithmic systems in this context refer to the combination of algorithms, data and the interface process that together determine the outcomes that affect end users. Many types of decisions can be made faster and more efficiently using algorithms. A significant factor in the adoption of algorithmic systems for decision-making is their capacity to process large amounts of varied data sets (i.e. big data), which can be paired with machine learning methods in order to infer statistical models directly from the data. The same properties of scale, complexity and autonomous model inference however are linked to increasing concerns that many of these systems are opaque to the people affected by their use and lack clear explanations for the decisions they make. This lack of transparency risks undermining meaningful scrutiny and accountability, which is a significant concern when these systems are applied as part of decision-making processes that can have a considerable impact on people's human rights (e.g. critical safety decisions in autonomous vehicles; allocation of health and social service resources, etc.). This study develops policy options for the governance of algorithmic transparency and accountability, based on an analysis of the social, technical and regulatory challenges posed by algorithmic systems. Based on a review and analysis of existing proposals for governance of algorithmic systems, a set of four policy options are proposed, each of which addresses a different aspect of algorithmic transparency and accountability: 1. awareness raising: education, watchdogs and whistleblowers; 2. accountability in public-sector use of algorithmic decision-making; 3. regulatory oversight and legal liability; and 4. global coordination for algorithmic governance