22 research outputs found
Explainable machine learning for public policy: Use cases, gaps, and research directions
Explainability is highly desired in machine learning (ML) systems supporting high-stakes policy decisions in areas such as health, criminal justice, education, and employment. While the field of explainable ML has expanded in recent years, much of this work has not taken real-world needs into account. A majority of proposed methods are designed with generic explainability goals without well-defined use cases or intended end users and evaluated on simplified tasks, benchmark problems/datasets, or with proxy users (e.g., Amazon Mechanical Turk). We argue that these simplified evaluation settings do not capture the nuances and complexities of real-world applications. As a result, the applicability and effectiveness of this large body of theoretical and methodological work in real-world applications are unclear. In this work, we take steps toward addressing this gap for the domain of public policy. First, we identify the primary use cases of explainable ML within public policy problems. For each use case, we define the end users of explanations and the specific goals the explanations have to fulfill. Finally, we map existing work in explainable ML to these use cases, identify gaps in established capabilities, and propose research directions to fill those gaps to have a practical societal impact through ML. The contribution is (a) a methodology for explainable ML researchers to identify use cases and develop methods targeted at them and (b) using that methodology for the domain of public policy and giving an example for the researchers on developing explainable ML methods that result in real-world impact
Locating and measuring marine aquaculture production from space: a computer vision approach in the French Mediterranean
Aquaculture production -- the cultivation of aquatic plants and animals --
has grown rapidly since the 1990s, but sparse, self-reported and aggregate
production data limits the effective understanding and monitoring of the
industry's trends and potential risks. Building on a manual survey of
aquaculture production from remote sensing imagery, we train a computer vision
model to identify marine aquaculture cages from aerial and satellite imagery,
and generate a spatially explicit dataset of finfish production locations in
the French Mediterranean from 2000-2021 that includes 4,010 cages (69m2 average
cage area). We demonstrate the value of our method as an easily adaptable,
cost-effective approach that can improve the speed and reliability of
aquaculture surveys, and enables downstream analyses relevant to researchers
and regulators. We illustrate its use to compute independent estimates of
production, and develop a flexible framework to quantify uncertainty in these
estimates. Overall, our study presents an efficient, scalable and highly
adaptable method for monitoring aquaculture production from remote sensing
imagery
A Transcriptional Logic for Nuclear Reprogramming
Limitations on a differentiated cell's pluripotency can be erased by nuclear transfer or by fusion with embryonic stem cells, but attempts to recapitulate this process of nuclear reprogramming by molecular means have failed. In this issue of Cell, Takahashi and Yamanaka (2006) take a rational approach to identifying a suite of embryonic transcription factors whose overexpression restores pluripotency to adult somatic cells
An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy Settings
Applications of machine learning (ML) to high-stakes policy settings - such as education, criminal justice, healthcare, and social service delivery - have grown rapidly in recent years, sparking important conversations about how to ensure fair outcomes from these systems. The machine learning research community has responded to this challenge with a wide array of proposed fairness-enhancing strategies for ML models, but despite the large number of methods that have been developed, little empirical work exists evaluating these methods in real-world settings. Here, we seek to fill this research gap by investigating the performance of several methods that operate at different points in the ML pipeline across four real-world public policy and social good problems. Across these problems, we find a wide degree of variability and inconsistency in the ability of many of these methods to improve model fairness, but postprocessing by choosing group-specific score thresholds consistently removes disparities, with important implications for both the ML research community and practitioners deploying machine learning to inform consequential policy decisions.</jats:p
