17 research outputs found
Mixed-Integer Optimization with Constraint Learning
We establish a broad methodological foundation for mixed-integer optimization
with learned constraints. We propose an end-to-end pipeline for data-driven
decision making in which constraints and objectives are directly learned from
data using machine learning, and the trained models are embedded in an
optimization formulation. We exploit the mixed-integer
optimization-representability of many machine learning methods, including
linear models, decision trees, ensembles, and multi-layer perceptrons, which
allows us to capture various underlying relationships between decisions,
contextual variables, and outcomes. We also introduce two approaches for
handling the inherent uncertainty of learning from data. First, we characterize
a decision trust region using the convex hull of the observations, to ensure
credible recommendations and avoid extrapolation. We efficiently incorporate
this representation using column generation and propose a more flexible
formulation to deal with low-density regions and high-dimensional datasets.
Then, we propose an ensemble learning approach that enforces constraint
satisfaction over multiple bootstrapped estimators or multiple algorithms. In
combination with domain-driven components, the embedded models and trust region
define a mixed-integer optimization problem for prescription generation. We
implement this framework as a Python package (OptiCL) for practitioners. We
demonstrate the method in both World Food Programme planning and chemotherapy
optimization. The case studies illustrate the framework's ability to generate
high-quality prescriptions as well as the value added by the trust region, the
use of ensembles to control model robustness, the consideration of multiple
machine learning methods, and the inclusion of multiple learned constraints
Data-driven healthcare via constraint learning and analytics
The proliferation of digitally-available medical data has enabled a new paradigm of decision-making in medicine. Machine learning allows us to glean large-scale insights directly from data, systematizing the heuristic risk assessment process that physicians use on a local scale. Optimization similarly adds rigor to decision-making, providing a quantitative framework for optimizing decisions under certain constraints. The rise in data, coupled with methodological and computational advancements in these fields, presents both opportunities and challenges. In this thesis, we leverage machine learning and optimization to learn from data and drive better decisions in healthcare. We propose novel approaches motivated by current methodological gaps, and we use analytics to tackle clinically-driven problems. This thesis develops methods and applied models to bridge the gap between research and clinical practice, with interpretability and impact as guiding principles.
The first part of the thesis focuses on the development of new approaches for data-driven insights and decision-making. Chapter 2 introduces a constraint learning framework that embeds trained machine learning models directly into mixed-integer optimization formulations. We train machine learning models to approximate functional relationships between decisions and outcomes of interest and subsequently optimize decisions under these data-driven learned constraints and/or objectives. We also highlight an application of this framework in chemotherapy regimen design. In Chapter 3, we propose an interpretable clustering algorithm which learns a tree-based data partition in which each leaf comprises a distinct cluster. We recover high-quality clusters that can be explicitly described by their decision paths.
The second part of the thesis leverages machine learning and optimization to improve risk prediction and treatment decisions in various domains. We present three such applications. In Chapter 4, we study neutropenic events in chemotherapy patients. We propose a risk prediction model based on a patient's dynamic clinical trajectory over the course of multiple chemotherapy cycles. Chapter 5 demonstrates the use of analytics to address the COVID-19 pandemic. We curate a multi-center, international database of COVID-19 patients and their outcomes, which forms the basis for a COVID-19 mortality risk model for hospitalized patients. Finally, Chapter 6 examines the effectiveness of in-person vs. virtual care from a causal inference lens, considering the effect of visit modality on both operational and clinical outcomes. The resultant machine learning models inform an optimization formulation for allocating telehealth and in-person visits for diabetic patients.Ph.D
Interpretable clustering: an optimization approach
Abstract
State-of-the-art clustering algorithms provide little insight into the rationale for cluster membership, limiting their interpretability. In complex real-world applications, the latter poses a barrier to machine learning adoption when experts are asked to provide detailed explanations of their algorithms’ recommendations. We present a new unsupervised learning method that leverages Mixed Integer Optimization techniques to generate interpretable tree-based clustering models. Utilizing a flexible optimization-driven framework, our algorithm approximates the globally optimal solution leading to high quality partitions of the feature space. We propose a novel method which can optimize for various clustering internal validation metrics and naturally determines the optimal number of clusters. It successfully addresses the challenge of mixed numerical and categorical data and achieves comparable or superior performance to other clustering methods on both synthetic and real-world datasets while offering significantly higher interpretability
Prediction of Neutropenic Events in Chemotherapy Patients: A Machine Learning Approach
PURPOSE Severe and febrile neutropenia present serious hazards to patients with cancer undergoing chemotherapy. We seek to develop a machine learning–based neutropenia prediction model that can be used to assess risk at the initiation of a chemotherapy cycle. MATERIALS AND METHODS We leverage rich electronic medical records (EMRs) data from a large health care system and apply machine learning methods to predict severe and febrile neutropenic events. We outline the data curation process and challenges posed by EMRs data. We explore a range of algorithms with an emphasis on model interpretability and ease of use in a clinical setting. RESULTS Our final proposed model demonstrates an out-of-sample area under the receiver operating characteristic curve of 0.865 (95% CI, 0.830 to 0.891) in the prediction of neutropenic events on the basis of only 20 clinical features. The model validates known risk factors and offers insight into potential novel clinical indicators and treatment characteristics that elevate risk. It relies on factors that are directly extractable from EMRs, provided a tool can be easily integrated into existing workflows. A cost-based analysis provides insight into optimal risk thresholds and offers a framework for tailoring algorithms to individual hospital needs. CONCLUSION A better understanding of neutropenic risk on an individual level enables a more informed approach to patient monitoring and treatment decisions. </jats:sec
COVID-19 Mortality Risk Assessment: An International Multi-Center Study
Background: Timely identification of COVID-19 patients at high risk of mortality can
significantly improve patient management and resource allocation within hospitals. This
study seeks to develop and validate a data-driven personalized mortality risk calculator for
hospitalized COVID-19 patients.Methods: De-identified data was obtained for 3,927 COVID-19 positive patients from six
independent centers, comprising 33 different hospitals. Demographic, clinical, and laboratory
variables were collected at hospital admission. The COVID-19 Mortality Risk (CMR) tool
was developed using the XGBoost algorithm to predict mortality. Its discrimination
performance was subsequently evaluated on three validation cohorts.Findings: The derivation cohort of 3,062 patients has an observed mortality rate of 26.84%.
Increased age, decreased oxygen saturation (≤ 93%), elevated levels of C-reactive protein (≥
130 mg/L), blood urea nitrogen (≥ 18 mg/dL), and blood creatinine (≥ 1.2 mg/dL) were
identified as primary risk factors, validating clinical findings. The model obtains out-ofsample AUCs of 0.90 (95% CI, 0.87-0.94) on the derivation cohort. In the validation cohorts,
the model obtains AUCs of 0.92 (95% CI, 0.88-0.95) on Seville patients, 0.87 (95% CI, 0.84-
0.91) on Hellenic COVID-19 Study Group patients, and 0.81 (95% CI, 0.76-0.85) on Hartford
Hospital patients. The CMR tool is available as an online application at
covidanalytics.io/mortality_calculator and is currently in clinical use.Interpretation: The CMR model leverages machine learning to generate accurate mortality
predictions using commonly available clinical features. This is the first risk score trained and
validated on a cohort of COVID-19 patients from Europe and the United States.HW is supported by the National Science Foundation Graduate Research
Fellowship under Grant No. 174530.N
Personalized prescription of ACEI/ARBs for hypertensive COVID-19 patients
Abstract
The COVID-19 pandemic has prompted an international effort to develop and repurpose medications and procedures to effectively combat the disease. Several groups have focused on the potential treatment utility of angiotensin-converting–enzyme inhibitors (ACEIs) and angiotensin-receptor blockers (ARBs) for hypertensive COVID-19 patients, with inconclusive evidence thus far. We couple electronic medical record (EMR) and registry data of 3,643 patients from Spain, Italy, Germany, Ecuador, and the US with a machine learning framework to personalize the prescription of ACEIs and ARBs to hypertensive COVID-19 patients. Our approach leverages clinical and demographic information to identify hospitalized individuals whose probability of mortality or morbidity can decrease by prescribing this class of drugs. In particular, the algorithm proposes increasing ACEI/ARBs prescriptions for patients with cardiovascular disease and decreasing prescriptions for those with low oxygen saturation at admission. We show that personalized recommendations can improve patient outcomes by 1.0% compared to the standard of care when applied to external populations. We develop an interactive interface for our algorithm, providing physicians with an actionable tool to easily assess treatment alternatives and inform clinical decisions. This work offers the first personalized recommendation system to accurately evaluate the efficacy and risks of prescribing ACEIs and ARBs to hypertensive COVID-19 patients
COVID-19 mortality risk assessment: An international multi-center study
Timely identification of COVID-19 patients at high risk of mortality can significantly improve patient management and resource allocation within hospitals. This study seeks to develop and validate a data-driven personalized mortality risk calculator for hospitalized COVID-19 patients. De-identified data was obtained for 3,927 COVID-19 positive patients from six independent centers, comprising 33 different hospitals. Demographic, clinical, and laboratory variables were collected at hospital admission. The COVID-19 Mortality Risk (CMR) tool was developed using the XGBoost algorithm to predict mortality. Its discrimination performance was subsequently evaluated on three validation cohorts. The derivation cohort of 3,062 patients has an observed mortality rate of 26.84%. Increased age, decreased oxygen saturation (≤ 93%), elevated levels of C-reactive protein (≥ 130 mg/L), blood urea nitrogen (≥ 18 mg/dL), and blood creatinine (≥ 1.2 mg/dL) were identified as primary risk factors, validating clinical findings. The model obtains out-of-sample AUCs of 0.90 (95% CI, 0.87–0.94) on the derivation cohort. In the validation cohorts, the model obtains AUCs of 0.92 (95% CI, 0.88–0.95) on Seville patients, 0.87 (95% CI, 0.84–0.91) on Hellenic COVID-19 Study Group patients, and 0.81 (95% CI, 0.76–0.85) on Hartford Hospital patients. The CMR tool is available as an online application at https:/www.covidanalytics.io/mortality_calculator and is currently in clinical use. The CMR model leverages machine learning to generate accurate mortality predictions using commonly available clinical features. This is the first risk score trained and validated on a cohort of COVID-19 patients from Europe and the United States