179 research outputs found
Why Does My Model Fail? Contrastive Local Explanations for Retail Forecasting
In various business settings, there is an interest in using more complex
machine learning techniques for sales forecasting. It is difficult to convince
analysts, along with their superiors, to adopt these techniques since the
models are considered to be "black boxes," even if they perform better than
current models in use. We examine the impact of contrastive explanations about
large errors on users' attitudes towards a "black-box'" model. We propose an
algorithm, Monte Carlo Bounds for Reasonable Predictions. Given a large error,
MC-BRP determines (1) feature values that would result in a reasonable
prediction, and (2) general trends between each feature and the target, both
based on Monte Carlo simulations. We evaluate on a real dataset with real users
by conducting a user study with 75 participants to determine if explanations
generated by MC-BRP help users understand why a prediction results in a large
error, and if this promotes trust in an automatically-learned model. Our study
shows that users are able to answer objective questions about the model's
predictions with overall 81.1% accuracy when provided with these contrastive
explanations. We show that users who saw MC-BRP explanations understand why the
model makes large errors in predictions significantly more than users in the
control group. We also conduct an in-depth analysis on the difference in
attitudes between Practitioners and Researchers, and confirm that our results
hold when conditioning on the users' background.Comment: To appear in ACM FAT* 202
What Do We Want From Explainable Artificial Intelligence (XAI)? -- A Stakeholder Perspective on XAI and a Conceptual Model Guiding Interdisciplinary XAI Research
Previous research in Explainable Artificial Intelligence (XAI) suggests that
a main aim of explainability approaches is to satisfy specific interests,
goals, expectations, needs, and demands regarding artificial systems (we call
these stakeholders' desiderata) in a variety of contexts. However, the
literature on XAI is vast, spreads out across multiple largely disconnected
disciplines, and it often remains unclear how explainability approaches are
supposed to achieve the goal of satisfying stakeholders' desiderata. This paper
discusses the main classes of stakeholders calling for explainability of
artificial systems and reviews their desiderata. We provide a model that
explicitly spells out the main concepts and relations necessary to consider and
investigate when evaluating, adjusting, choosing, and developing explainability
approaches that aim to satisfy stakeholders' desiderata. This model can serve
researchers from the variety of different disciplines involved in XAI as a
common ground. It emphasizes where there is interdisciplinary potential in the
evaluation and the development of explainability approaches.Comment: 57 pages, 2 figures, 1 table, to be published in Artificial
Intelligence, Markus Langer, Daniel Oster and Timo Speith share
first-authorship of this pape
AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model
© 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution
SIS 2017. Statistics and Data Science: new challenges, new generations
The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data
Contrastive Explanations for Large Errors in Retail Forecasting Predictions through Monte Carlo Simulations
At Ahold Delhaize, there is an interest in using more complex machine learning techniques for sales forecasting. It is difficult to convince analysts, along with their superiors, to adopt these techniques since the models are considered to be'black boxes,'even if they perform better than current models in use. We aim to explore the impact of contrastive explanations about large errors on users' attitudes towards a 'black-box'model. In this work, we make two contributions. The first is an algorithm, Monte Carlo Bounds for Reasonable Predictions (MC-BRP). Given a large error, MC-BRP determines (1) feature values that would result in a reasonable prediction, and (2) general trends between each feature and the target, based on Monte Carlo simulations. The second contribution is the evaluation of MC-BRP along with its outcomes, which has both objective and subjective components. We evaluate on a real dataset with real users from Ahold Delhaize by conducting a user study to determine if explanations generated by MC-BRP help users understand why a prediction results in a large error, and if this promotes trust in an automatically-learned model. The study shows that users are able to answer objective questions about the model's predictions with overall 81.7% accuracy when provided with these contrastive explanations. We also show that users who saw MC-BRP explanations understand why the model makes large errors in predictions significantly more than users in the control group
- …