Search CORE

100 research outputs found

Shapley Values with Uncertain Value Functions

Author: Gerlach Thore
Heese Raoul
Jakobs Matthias
Mücke Sascha
Piatkowski Nico
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/01/2023
Field of study

We propose a novel definition of Shapley values with uncertain value functions based on first principles using probability theory. Such uncertain value functions can arise in the context of explainable machine learning as a result of non-deterministic algorithms. We show that random effects can in fact be absorbed into a Shapley value with a noiseless but shifted value function. Hence, Shapley values with uncertain value functions can be used in analogy to regular Shapley values. However, their reliable evaluation typically requires more computational effort.Comment: 12 pages, 1 figure, 1 tabl

arXiv.org e-Print Archive

Explaining Drift using Shapley Values

Author: Edakunni Narayanan U.
Jain Anukriti
Tekriwal Utkarsh
Publication venue
Publication date: 18/01/2024
Field of study

Machine learning models often deteriorate in their performance when they are used to predict the outcomes over data on which they were not trained. These scenarios can often arise in real world when the distribution of data changes gradually or abruptly due to major events like a pandemic. There have been many attempts in machine learning research to come up with techniques that are resilient to such Concept drifts. However, there is no principled framework to identify the drivers behind the drift in model performance. In this paper, we propose a novel framework - DBShap that uses Shapley values to identify the main contributors of the drift and quantify their respective contributions. The proposed framework not only quantifies the importance of individual features in driving the drift but also includes the change in the underlying relation between the input and output as a possible driver. The explanation provided by DBShap can be used to understand the root cause behind the drift and use it to make the model resilient to the drift

arXiv.org e-Print Archive

Socio-economic disparities and COVID-19 in the USA

Author: Englert Philipp
Paul Ayan
Varga Melinda
Publication venue
Publication date: 01/01/2020
Field of study

COVID-19 is not a universal killer. We study the spread of COVID-19 at the county level for the United States up until the 15

^{th}

of August, 2020. We show that the prevalence of the disease and the death rate are correlated with the local socio-economic conditions often going beyond local population density distributions, especially in rural areas. We correlate the COVID-19 prevalence and death rate with data from the US Census Bureau and point out how the spreading patterns of the disease show asymmetries in urban and rural areas separately and is preferentially affecting the counties where a large fraction of the population is non-white. Our findings can be used for more targeted policy building and deployment of resources for future occurrence of a pandemic due to SARS-CoV-2. Our methodology, based on interpretable machine learning and game theory, can be extended to study the spread of other diseases.Comment: 10 pages, 5 figures and 1 tabl

arXiv.org e-Print Archive

DESY

A Baseline for Shapley Values in MLPs: from Missingness to Neutrality

Author: Izzo Cosimo
Lipani Aldo
Medda Francesca
Okhrati Ramin
Publication venue
Publication date: 14/06/2020
Field of study

Being able to explain a prediction as well as having a model that performs well are paramount in many machine learning applications. Deep neural networks have gained momentum recently on the basis of their accuracy, however these are often criticised to be black-boxes. Many authors have focused on proposing methods to explain their predictions. Among these explainability methods, feature attribution methods have been favoured for their strong theoretical foundation: the Shapley value. A limitation of Shapley value is the need to define a baseline (aka reference point) representing the missingness of a feature. In this paper, we present a method to choose a baseline based on a neutrality value: a parameter defined by decision makers at which their choices are determined by the returned value of the model being either below or above it. Based on this concept, we theoretically justify these neutral baselines and find a way to identify them for MLPs. Then, we experimentally demonstrate that for a binary classification task, using a synthetic dataset and a dataset coming from the financial domain, the proposed baselines outperform, in terms of local explanability power, standard ways of choosing them

arXiv.org e-Print Archive

UCL Discovery