12 research outputs found
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation
In applying reinforcement learning (RL) to high-stakes domains, quantitative
and qualitative evaluation using observational data can help practitioners
understand the generalization performance of new policies. However, this type
of off-policy evaluation (OPE) is inherently limited since offline data may not
reflect the distribution shifts resulting from the application of new policies.
On the other hand, online evaluation by collecting rollouts according to the
new policy is often infeasible, as deploying new policies in these domains can
be unsafe. In this work, we propose a semi-offline evaluation framework as an
intermediate step between offline and online evaluation, where human users
provide annotations of unobserved counterfactual trajectories. While tempting
to simply augment existing data with such annotations, we show that this naive
approach can lead to biased results. Instead, we design a new family of OPE
estimators based on importance sampling (IS) and a novel weighting scheme that
incorporate counterfactual annotations without introducing additional bias. We
analyze the theoretical properties of our approach, showing its potential to
reduce both bias and variance compared to standard IS estimators. Our analyses
reveal important practical considerations for handling biased, noisy, or
missing annotations. In a series of proof-of-concept experiments involving
bandits and a healthcare-inspired simulator, we demonstrate that our approach
outperforms purely offline IS estimators and is robust to imperfect
annotations. Our framework, combined with principled human-centered design of
annotation solicitation, can enable the application of RL in high-stakes
domains.Comment: 36 pages, 12 figures, 5 tables. NeurIPS 2023. Code available at
https://github.com/MLD3/CounterfactualAnnot-SemiOP
Leveraging Factored Action Spaces for Off-Policy Evaluation
Off-policy evaluation (OPE) aims to estimate the benefit of following a
counterfactual sequence of actions, given data collected from executed
sequences. However, existing OPE estimators often exhibit high bias and high
variance in problems involving large, combinatorial action spaces. We
investigate how to mitigate this issue using factored action spaces i.e.
expressing each action as a combination of independent sub-actions from smaller
action spaces. This approach facilitates a finer-grained analysis of how
actions differ in their effects. In this work, we propose a new family of
"decomposed" importance sampling (IS) estimators based on factored action
spaces. Given certain assumptions on the underlying problem structure, we prove
that the decomposed IS estimators have less variance than their original
non-decomposed versions, while preserving the property of zero bias. Through
simulations, we empirically verify our theoretical results, probing the
validity of various assumptions. Provided with a technique that can derive the
action space factorisation for a given problem, our work shows that OPE can be
improved "for free" by utilising this inherent problem structure.Comment: Main paper: 8 pages, 7 figures. Appendix: 30 pages, 17 figures.
Accepted at ICML 2023 Workshop on Counterfactuals in Minds and Machines,
Honolulu, Hawaii, USA. Camera ready versio
Respecting Autonomy and Enabling Diversity: The Effect of Eligibility and Enrollment on Research Data Demographics
Many promising advances in precision health and other Big Data research rely on large data sets to analyze correlations among genetic variants, behavior, environment, and outcomes to improve population health. But these data sets are generally populated with demographically homogeneous cohorts. We conducted a retrospective cohort study of patients at a major academic medical center during 2012–19 to explore how recruitment and enrollment approaches affected the demographic diversity of participants in its research biospecimen and data bank. We found that compared with the overall clinical population, patients who consented to enroll in the research data bank were significantly less diverse in terms of age, sex, race, ethnicity, and socioeconomic status. Compared with patients who were recruited for the data bank, patients who enrolled were younger and less likely to be Black or African American, Asian, or Hispanic. The overall demographic diversity of the data bank was affected as much (and in some cases more) by which patients were considered eligible for recruitment as by which patients consented to enroll. Our work underscores the need for systemic commitment to diversify data banks so that different communities can benefit from research
Machine Learning for Health symposium 2023 -- Findings track
A collection of the accepted Findings papers that were presented at the 3rd
Machine Learning for Health symposium (ML4H 2023), which was held on December
10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality
submissions on relevant problems in a variety of health-related disciplines
including healthcare, biomedicine, and public health. Two submission tracks
were offered: the archival Proceedings track, and the non-archival Findings
track. Proceedings were targeted at mature work with strong technical
sophistication and a high impact to health. The Findings track looked for new
ideas that could spark insightful discussion, serve as valuable resources for
the community, or could enable new collaborations. Submissions to the
Proceedings track, if not accepted, were automatically considered for the
Findings track. All the manuscripts submitted to ML4H Symposium underwent a
double-blind peer-review process
Proceedings of the 3rd Machine Learning for Health Symposium, ML4H 2023
The proceedings contain 40 papers. The topics discussed include: towards equitable kidney tumor segmentation: bias evaluation and mitigation; diffusion model-based data augmentation for lung ultrasound classification with limited data; representing visual classification as a linear combination of words; learning temporal higher-order patterns to detect anomalous brain activity; multi-modal graph learning over UMLS knowledge graphs; LLMs accelerate annotation for medical information extraction; towards reliable dermatology evaluation benchmarks; a probabilistic method to predict classifier accuracy on larger datasets given small pilot data; diffusion models to predict 3D late mechanical activation from sparse 2D cardiac MRIs; and NoteContrast: contrastive language-diagnostic pretraining for medical text
Early identification of patients admitted to hospital for covid-19 at risk of clinical deterioration: model development and multisite external validation study
ObjectiveTo create and validate a simple and transferable machine learning model from electronic health record data to accurately predict clinical deterioration in patients with covid-19 across institutions, through use of a novel paradigm for model development and code sharing.DesignRetrospective cohort study.SettingOne US hospital during 2015-21 was used for model training and internal validation. External validation was conducted on patients admitted to hospital with covid-19 at 12 other US medical centers during 2020-21.Participants33 119 adults (≥18 years) admitted to hospital with respiratory distress or covid-19.Main outcome measuresAn ensemble of linear models was trained on the development cohort to predict a composite outcome of clinical deterioration within the first five days of hospital admission, defined as in-hospital mortality or any of three treatments indicating severe illness: mechanical ventilation, heated high flow nasal cannula, or intravenous vasopressors. The model was based on nine clinical and personal characteristic variables selected from 2686 variables available in the electronic health record. Internal and external validation performance was measured using the area under the receiver operating characteristic curve (AUROC) and the expected calibration error-the difference between predicted risk and actual risk. Potential bed day savings were estimated by calculating how many bed days hospitals could save per patient if low risk patients identified by the model were discharged early.Results9291 covid-19 related hospital admissions at 13 medical centers were used for model validation, of which 1510 (16.3%) were related to the primary outcome. When the model was applied to the internal validation cohort, it achieved an AUROC of 0.80 (95% confidence interval 0.77 to 0.84) and an expected calibration error of 0.01 (95% confidence interval 0.00 to 0.02). Performance was consistent when validated in the 12 external medical centers (AUROC range 0.77-0.84), across subgroups of sex, age, race, and ethnicity (AUROC range 0.78-0.84), and across quarters (AUROC range 0.73-0.83). Using the model to triage low risk patients could potentially save up to 7.8 bed days per patient resulting from early discharge.ConclusionA model to predict clinical deterioration was developed rapidly in response to the covid-19 pandemic at a single hospital, was applied externally without the sharing of data, and performed well across multiple medical centers, patient subgroups, and time periods, showing its potential as a tool for use in optimizing healthcare resources