205 research outputs found
When Personalization Harms: Reconsidering the Use of Group Attributes in Prediction
Machine learning models are often personalized with categorical attributes
that are protected, sensitive, self-reported, or costly to acquire. In this
work, we show models that are personalized with group attributes can reduce
performance at a group level. We propose formal conditions to ensure the "fair
use" of group attributes in prediction tasks by training one additional model
-- i.e., collective preference guarantees to ensure that each group who
provides personal data will receive a tailored gain in performance in return.
We present sufficient conditions to ensure fair use in empirical risk
minimization and characterize failure modes that lead to fair use violations
due to standard practices in model development and deployment. We present a
comprehensive empirical study of fair use in clinical prediction tasks. Our
results demonstrate the prevalence of fair use violations in practice and
illustrate simple interventions to mitigate their harm.Comment: ICML 2023 Ora
Deep Metric Learning for the Hemodynamics Inference with Electrocardiogram Signals
Heart failure is a debilitating condition that affects millions of people
worldwide and has a significant impact on their quality of life and mortality
rates. An objective assessment of cardiac pressures remains an important method
for the diagnosis and treatment prognostication for patients with heart
failure. Although cardiac catheterization is the gold standard for estimating
central hemodynamic pressures, it is an invasive procedure that carries
inherent risks, making it a potentially dangerous procedure for some patients.
Approaches that leverage non-invasive signals - such as electrocardiogram (ECG)
- have the promise to make the routine estimation of cardiac pressures feasible
in both inpatient and outpatient settings. Prior models trained to estimate
intracardiac pressures (e.g., mean pulmonary capillary wedge pressure (mPCWP))
in a supervised fashion have shown good discriminatory ability but have been
limited to the labeled dataset from the heart failure cohort. To address this
issue and build a robust representation, we apply deep metric learning (DML)
and propose a novel self-supervised DML with distance-based mining that
improves the performance of a model with limited labels. We use a dataset that
contains over 5.4 million ECGs without concomitant central pressure labels to
pre-train a self-supervised DML model which showed improved classification of
elevated mPCWP compared to self-supervised contrastive baselines. Additionally,
the supervised DML model that uses ECGs with access to 8,172 mPCWP labels
demonstrated significantly better performance on the mPCWP regression task
compared to the supervised baseline. Moreover, our data suggest that DML yields
models that are performant across patient subgroups, even when some patient
subgroups are under-represented in the dataset. Our code is available at
https://github.com/mandiehyewon/ssldm
- …