39,896 research outputs found
Fairness in Algorithmic Decision Making: An Excursion Through the Lens of Causality
As virtually all aspects of our lives are increasingly impacted by
algorithmic decision making systems, it is incumbent upon us as a society to
ensure such systems do not become instruments of unfair discrimination on the
basis of gender, race, ethnicity, religion, etc. We consider the problem of
determining whether the decisions made by such systems are discriminatory,
through the lens of causal models. We introduce two definitions of group
fairness grounded in causality: fair on average causal effect (FACE), and fair
on average causal effect on the treated (FACT). We use the Rubin-Neyman
potential outcomes framework for the analysis of cause-effect relationships to
robustly estimate FACE and FACT. We demonstrate the effectiveness of our
proposed approach on synthetic data. Our analyses of two real-world data sets,
the Adult income data set from the UCI repository (with gender as the protected
attribute), and the NYC Stop and Frisk data set (with race as the protected
attribute), show that the evidence of discrimination obtained by FACE and FACT,
or lack thereof, is often in agreement with the findings from other studies. We
further show that FACT, being somewhat more nuanced compared to FACE, can yield
findings of discrimination that differ from those obtained using FACE.Comment: 7 pages, 2 figures, 2 tables.To appear in Proceedings of the
International Conference on World Wide Web (WWW), 201
Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations
Post-hoc explanations of machine learning models are crucial for people to
understand and act on algorithmic predictions. An intriguing class of
explanations is through counterfactuals, hypothetical examples that show people
how to obtain a different prediction. We posit that effective counterfactual
explanations should satisfy two properties: feasibility of the counterfactual
actions given user context and constraints, and diversity among the
counterfactuals presented. To this end, we propose a framework for generating
and evaluating a diverse set of counterfactual explanations based on
determinantal point processes. To evaluate the actionability of
counterfactuals, we provide metrics that enable comparison of
counterfactual-based methods to other local explanation methods. We further
address necessary tradeoffs and point to causal implications in optimizing for
counterfactuals. Our experiments on four real-world datasets show that our
framework can generate a set of counterfactuals that are diverse and well
approximate local decision boundaries, outperforming prior approaches to
generating diverse counterfactuals. We provide an implementation of the
framework at https://github.com/microsoft/DiCE.Comment: 13 page
Eliminating Latent Discrimination: Train Then Mask
How can we control for latent discrimination in predictive models? How can we
provably remove it? Such questions are at the heart of algorithmic fairness and
its impacts on society. In this paper, we define a new operational fairness
criteria, inspired by the well-understood notion of omitted variable-bias in
statistics and econometrics. Our notion of fairness effectively controls for
sensitive features and provides diagnostics for deviations from fair decision
making. We then establish analytical and algorithmic results about the
existence of a fair classifier in the context of supervised learning. Our
results readily imply a simple, but rather counter-intuitive, strategy for
eliminating latent discrimination. In order to prevent other features proxying
for sensitive features, we need to include sensitive features in the training
phase, but exclude them in the test/evaluation phase while controlling for
their effects. We evaluate the performance of our algorithm on several
real-world datasets and show how fairness for these datasets can be improved
with a very small loss in accuracy
Inferring causal relations from multivariate time series : a fast method for large-scale gene expression data
Various multivariate time series analysis techniques have been developed with the aim of inferring causal relations between time series. Previously, these techniques have proved their effectiveness on economic and neurophysiological data, which normally consist of hundreds of samples. However, in their applications to gene regulatory inference, the small sample size of gene expression time series poses an obstacle. In this paper, we describe some of the most commonly used multivariate inference techniques and show the potential challenge related to gene expression analysis. In response, we propose a directed partial correlation (DPC) algorithm as an efficient and effective solution to causal/regulatory relations inference on small sample gene expression data. Comparative evaluations on the existing techniques and the proposed method are presented. To draw reliable conclusions, a comprehensive benchmarking on data sets of various setups is essential. Three experiments are designed to assess these methods in a coherent manner. Detailed analysis of experimental results not only reveals good accuracy of the proposed DPC method in large-scale prediction, but also gives much insight into all methods under evaluation
Improving Bayesian statistics understanding in the age of Big Data with the bayesvl R package
The exponential growth of social data both in volume and complexity has increasingly exposed many of the shortcomings of the conventional frequentist approach to statistics. The scientific community has called for careful usage of the approach and its inference. Meanwhile, the alternative method, Bayesian statistics, still faces considerable barriers toward a more widespread application. The bayesvl R package is an open program, designed for implementing Bayesian modeling and analysis using the Stan language’s no-U-turn (NUTS) sampler. The package combines the ability to construct Bayesian network models using directed acyclic graphs (DAGs), the Markov chain Monte Carlo (MCMC) simulation technique, and the graphic capability of the ggplot2 package. As a result, it can improve the user experience and intuitive understanding when constructing and analyzing Bayesian network models. A case example is offered to illustrate the usefulness of the package for Big Data analytics and cognitive computing
A recommender system for process discovery
Over the last decade, several algorithms for process discovery and process conformance have been proposed. Still, it is well-accepted that there is no dominant algorithm in any of these two disciplines, and then it is often difficult to apply them successfully. Most of these algorithms need a close-to expert knowledge in order to be applied satisfactorily. In this paper, we present a recommender system that uses portfolio-based algorithm selection strategies to face the following problems: to find the best discovery algorithm for the data at hand, and to allow bridging the gap between general users and process mining algorithms. Experiments performed with the developed tool witness the usefulness of the approach for a variety of instances.Peer ReviewedPostprint (author’s final draft
- …