31 research outputs found
Disparate Censorship & Undertesting: A Source of Label Bias in Clinical Machine Learning
As machine learning (ML) models gain traction in clinical applications,
understanding the impact of clinician and societal biases on ML models is
increasingly important. While biases can arise in the labels used for model
training, the many sources from which these biases arise are not yet
well-studied. In this paper, we highlight disparate censorship (i.e.,
differences in testing rates across patient groups) as a source of label bias
that clinical ML models may amplify, potentially causing harm. Many patient
risk-stratification models are trained using the results of clinician-ordered
diagnostic and laboratory tests of labels. Patients without test results are
often assigned a negative label, which assumes that untested patients do not
experience the outcome. Since orders are affected by clinical and resource
considerations, testing may not be uniform in patient populations, giving rise
to disparate censorship. Disparate censorship in patients of equivalent risk
leads to undertesting in certain groups, and in turn, more biased labels for
such groups. Using such biased labels in standard ML pipelines could contribute
to gaps in model performance across patient groups. Here, we theoretically and
empirically characterize conditions in which disparate censorship or
undertesting affect model performance across subgroups. Our findings call
attention to disparate censorship as a source of label bias in clinical ML
models.Comment: 48 pages, 18 figures. Machine Learning for Healthcare Conference
(MLHC 2022
When do confounding by indication and inadequate risk adjustment bias critical care studies? A simulation study
Abstract
Introduction
In critical care observational studies, when clinicians administer different treatments to sicker patients, any treatment comparisons will be confounded by differences in severity of illness between patients. We sought to investigate the extent that observational studies assessing treatments are at risk of incorrectly concluding such treatments are ineffective or even harmful due to inadequate risk adjustment.
Methods
We performed Monte Carlo simulations of observational studies evaluating the effect of a hypothetical treatment on mortality in critically ill patients. We set the treatment to have either no association with mortality or to have a truly beneficial effect, but more often administered to sicker patients. We varied the strength of the treatment’s true effect, strength of confounding, study size, patient population, and accuracy of the severity of illness risk-adjustment (area under the receiver operator characteristics curve, AUROC). We measured rates in which studies made inaccurate conclusions about the treatment’s true effect due to confounding, and the measured odds ratios for mortality for such false associations.
Results
Simulated observational studies employing adequate risk-adjustment were generally able to measure a treatment’s true effect. As risk-adjustment worsened, rates of studies incorrectly concluding the treatment provided no benefit or harm increased, especially when sample size was large (n = 10,000). Even in scenarios of only low confounding, studies using the lower accuracy risk-adjustors (AUROC < 0.66) falsely concluded that a beneficial treatment was harmful. Measured odds ratios for mortality of 1.4 or higher were possible when the treatment’s true beneficial effect was an odds ratio for mortality of 0.6 or 0.8.
Conclusions
Large observational studies confounded by severity of illness have a high likelihood of obtaining incorrect results even after employing conventionally “acceptable” levels of risk-adjustment, with large effect sizes that may be construed as true associations. Reporting the AUROC of the risk-adjustment used in the analysis may facilitate an evaluation of a study’s risk for confounding.http://deepblue.lib.umich.edu/bitstream/2027.42/111639/1/13054_2015_Article_923.pd
IHPI Policy Brief: Pulse oximeters are less accurate in hospitalized Black patients
http://deepblue.lib.umich.edu/bitstream/2027.42/175331/1/IHPI Policy Brief - Puse oximeters are less accurate in hosptalized Black patients - September 2022.pdfSEL