36 research outputs found
A Framework for Understanding Unintended Consequences of Machine Learning
As machine learning increasingly affects people and society, it is important
that we strive for a comprehensive and unified understanding of potential
sources of unwanted consequences. For instance, downstream harms to particular
groups are often blamed on "biased data," but this concept encompass too many
issues to be useful in developing solutions. In this paper, we provide a
framework that partitions sources of downstream harm in machine learning into
six distinct categories spanning the data generation and machine learning
pipeline. We describe how these issues arise, how they are relevant to
particular applications, and how they motivate different solutions. In doing
so, we aim to facilitate the development of solutions that stem from an
understanding of application-specific populations and data generation
processes, rather than relying on general statements about what may or may not
be "fair."Comment: 6 pages, 2 figures; updated with corrected figure
Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs
Interpretability methods aim to help users build trust in and understand the
capabilities of machine learning models. However, existing approaches often
rely on abstract, complex visualizations that poorly map to the task at hand or
require non-trivial ML expertise to interpret. Here, we present two visual
analytics modules that facilitate an intuitive assessment of model reliability.
To help users better characterize and reason about a model's uncertainty, we
visualize raw and aggregate information about a given input's nearest
neighbors. Using an interactive editor, users can manipulate this input in
semantically-meaningful ways, determine the effect on the output, and compare
against their prior expectations. We evaluate our interface using an
electrocardiogram beat classification case study. Compared to a baseline
feature importance interface, we find that 14 physicians are better able to
align the model's uncertainty with domain-relevant factors and build intuition
about its capabilities and limitations
Feminicide & machine learning : detecting gender-based violence to strengthen civil sector activism
Although governments have passed legislation criminalizing feminicide, it is unaccompanied by relevant policy or robust data collection. This participatory action research project is designed to help sustain activist efforts to collect feminicide data through partially automated detection using machine learning. As a way to counter the impunity surrounding feminicide, activists have taken upon themselves to do the work that states have neglected. Partially automating detection supports efforts to systematize and sort data collection across contexts, and helps to inform policy advocacy through standardizing definitions and taxonomies. The ability to prioritize articles by likelihood of feminicide will make this intense research less gruelling
MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III
Robust machine learning relies on access to data that can be used with
standardized frameworks in important tasks and the ability to develop models
whose performance can be reasonably reproduced. In machine learning for
healthcare, the community faces reproducibility challenges due to a lack of
publicly accessible data and a lack of standardized data processing frameworks.
We present MIMIC-Extract, an open-source pipeline for transforming raw
electronic health record (EHR) data for critical care patients contained in the
publicly-available MIMIC-III database into dataframes that are directly usable
in common machine learning pipelines. MIMIC-Extract addresses three primary
challenges in making complex health records data accessible to the broader
machine learning community. First, it provides standardized data processing
functions, including unit conversion, outlier detection, and aggregating
semantically equivalent features, thus accounting for duplication and reducing
missingness. Second, it preserves the time series nature of clinical data and
can be easily integrated into clinically actionable prediction tasks in machine
learning for health. Finally, it is highly extensible so that other researchers
with related questions can easily use the same pipeline. We demonstrate the
utility of this pipeline by showcasing several benchmark tasks and baseline
results
Non-task expert physicians benefit from correct explainable AI advice when reviewing X-rays
Artificial intelligence (AI)-generated clinical advice is becoming more prevalent in healthcare. However, the impact of AI-generated advice on physicians’ decision-making is underexplored. In this study, physicians received X-rays with correct diagnostic advice and were asked to make a diagnosis, rate the advice’s quality, and judge their own confidence. We manipulated whether the advice came with or without a visual annotation on the X-rays, and whether it was labeled as coming from an AI or a human radiologist. Overall, receiving annotated advice from an AI resulted in the highest diagnostic accuracy. Physicians rated the quality of AI advice higher than human advice. We did not find a strong effect of either manipulation on participants’ confidence. The magnitude of the effects varied between task experts and non-task experts, with the latter benefiting considerably from correct explainable AI advice. These findings raise important considerations for the deployment of diagnostic advice in healthcare