Search CORE

50 research outputs found

Taming Technical Bias in Machine Learning Pipelines

Author: Schelter S.
Stoyanovich J.
Publication venue
Publication date: 01/12/2020
Field of study

Machine Learning (ML) is commonly used to automate decisions in domains as varied as credit and lending, medical diagnosis, and hiring. These decisions are consequential, imploring us to carefully balance the benefits of efficiency with the potential risks. Much of the conversation about the risks centers around bias — a term that is used by the technical community ever more frequently but that is still poorly understood. In this paper we focus on technical bias — a type of bias that has so far received limited attention and that the data engineering community is well-equipped to address. We discuss dimensions of technical bias that can arise through the ML lifecycle, particularly when it’s due to preprocessing decisions or post-deployment issues. We present results of our recent work, and discuss future research directions. Our over-all goal is to support the development of systems that expose the knobs of responsibility to data scientists, allowing them to detect instances of technical bias and to mitigate it when possible

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Symmetric Relations and Cardinality-Bounded Multisets in Database Systems

Author: J STOYANOVICH
K ROSS
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Crossref

Data distribution debugging in machine learning pipelines

Author: Grafberger S.
Groth P.
Schelter S.
Stoyanovich J.
Publication venue
Publication date: 01/09/2022
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Fairness-Aware Instrumentation of Preprocessing Pipelines for Machine Learning

Author: Huang B.
Schelter S.
Stoyanovich J.
Yang K.
Publication venue: HILDA
Publication date: 01/01/2020
Field of study

Surfacing and mitigating bias in ML pipelines is a complex topic, with a dire need to provide system-level support to data scientists. Humans should be empowered to debug these pipelines, in order to control for bias and to improve data quality and representativeness. We propose fair-DAGs, an open-source library that extracts directed acyclic graph (DAG) representations of the data flow in preprocessing pipelines for ML. The library subsequently instruments the pipelines with tracing and visualization code to capture changes in data distributions and identify distortions with respect to protected group membership as the data travels through the pipeline. We illustrate the utility of fair-DAGs with experiments on publicly available ML pipelines

International Migration, Integration and Social Cohesion online publications

UvA-DARE

MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines

Author: Grafberger S.
Guha S.
Schelter S.
Stoyanovich J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines

Author: Grafberger S.
Guha S.
Schelter S.
Stoyanovich J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Fairness-Aware Instrumentation of Preprocessing Pipelines for Machine Learning

Author: Huang B.
Schelter S.
Stoyanovich J.
Yang K.
Publication venue: HILDA
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Impact Remediation: Optimal Interventions to Reduce Inequality

Author: Bynum Lucius E. J.
Loftus Joshua R.
Stoyanovich Julia
Publication venue
Publication date: 01/07/2021
Field of study

A significant body of research in the data sciences considers unfair discrimination against social categories such as race or gender that could occur or be amplified as a result of algorithmic decisions. Simultaneously, real-world disparities continue to exist, even before algorithmic decisions are made. In this work, we draw on insights from the social sciences and humanistic studies brought into the realm of causal modeling and constrained optimization, and develop a novel algorithmic framework for tackling pre-existing real-world disparities. The purpose of our framework, which we call the "impact remediation framework," is to measure real-world disparities and discover the optimal intervention policies that could help improve equity or access to opportunity for those who are underserved with respect to an outcome of interest. We develop a disaggregated approach to tackling pre-existing disparities that relaxes the typical set of assumptions required for the use of social categories in structural causal models. Our approach flexibly incorporates counterfactuals and is compatible with various ontological assumptions about the nature of social categories. We demonstrate impact remediation with a real-world case study and compare our disaggregated approach to an existing state-of-the-art approach, comparing its structure and resulting policy recommendations. In contrast to most work on optimal policy learning, we explore disparity reduction itself as an objective, explicitly focusing the power of algorithms on reducing inequality

arXiv.org e-Print Archive

AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments

Author: C. J. Stoeckert
E. Manduchi
J. Liu
J. Stoyanovich
J. Zheng
Lukk
Rauch
Publication venue: Oxford University Press
Publication date
Field of study

The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute

Crossref

PubMed Central

AI reflections in 2020

Author: Braren Rickmer
Damasio Antonio
Eshraghian Jason
Hu Yipeng
Jamjoom Aimun A. B.
Jobin Anna
Kaissis Georgios
Luengo Oroz Miguel
Man Kingson
Mittelstadt Brendt
Ruiz Costa-Jussà Marta
Sinibaldi Edoardo
Stoyanovich Julia
Taddeo Mariarosaria
Tzachor Asaf
Van Bavel Jay J.
West Tessa V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/01/2021
Field of study

We invited authors of selected Comments and Perspectives published in Nature Machine Intelligence in the latter half of 2019 and first half of 2020 to describe how their topic has developed, what their thoughts are about the challenges of 2020, and what they look forward to in 2021.Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC