Search CORE

67 research outputs found

The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables

Author: Altman Russ B.
Sagreiya Hersh
Publication venue: Elsevier Inc.
Publication date: 01/10/2010
Field of study

AbstractThere is debate about the utility of clinical data warehouses for research. Using a clinical warfarin dosing algorithm derived from research-quality data, we evaluated the data quality of both a general-purpose database and a coagulation-specific database. We evaluated the functional utility of these repositories by using data extracted from them to predict warfarin dose. We reasoned that high-quality clinical data would predict doses nearly as accurately as research data, while poor-quality clinical data would predict doses less accurately. We evaluated the Mean Absolute Error (MAE) in predicted weekly dose as a metric of data quality. The MAE was comparable between the clinical gold standard (10.1mg/wk) and the specialty database (10.4mg/wk), but the MAE for the clinical warehouse was 40% greater (14.1mg/wk). Our results indicate that the research utility of clinical data collected in focused clinical settings is greater than that of data collected during general-purpose clinical care

Elsevier - Publisher Connector

PubMed Central

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Author: Dunnmon Jared
Goldman Roger
Khandwala Nishith
Lee-Messer Christopher
Lungren Matthew
Markert Matthew
Ratner Alexander
Rubin Daniel
Ré Christopher
Saab Khaled
Sagreiya Hersh
Publication venue
Publication date: 26/03/2019
Field of study

Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine

arXiv.org e-Print Archive

eScholarship - University of California

An integrative method for scoring candidate genes from association studies: application to warfarin dosing

Author: Altman Russ B
Butte Atul J
Dudley Joel T
Sagreiya Hersh
Tatonetti Nicholas P
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

BackgroundA key challenge in pharmacogenomics is the identification of genes whose variants contribute to drug response phenotypes, which can include severe adverse effects. Pharmacogenomics GWAS attempt to elucidate genotypes predictive of drug response. However, the size of these studies has severely limited their power and potential application. We propose a novel knowledge integration and SNP aggregation approach for identifying genes impacting drug response. Our SNP aggregation method characterizes the degree to which uncommon alleles of a gene are associated with drug response. We first use pre-existing knowledge sources to rank pharmacogenes by their likelihood to affect drug response. We then define a summary score for each gene based on allele frequencies and train linear and logistic regression classifiers to predict drug response phenotypes.ResultsWe applied our method to a published warfarin GWAS data set comprising 181 individuals. We find that our method can increase the power of the GWAS to identify both VKORC1 and CYP2C9 as warfarin pharmacogenes, where the original analysis had only identified VKORC1. Additionally, we find that our method can be used to discriminate between low-dose (AUROC=0.886) and high-dose (AUROC=0.764) responders.ConclusionsOur method offers a new route for candidate pharmacogene discovery from pharmacogenomics GWAS, and serves as a foundation for future work in methods for predictive pharmacogenomics

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Diagnoses and clinical features associated with high risk for unplanned readmission in vascular surgery. A cohort study

Author: Akbari
Apelquist
Best
Boulton
Cima
Consensus Development Conference on Diabetic Foot Wound Care
Courtney
Davies
DL
Dosluoglu
Executive
Goldner
Goshima
Jackson
Jencks
Koch
Larijani
Lawson
Madanchi
McPhee
Nehler
Promoting greater efficiency in Medicare
Ramsey
Romano
Sagreiya
Steinberg
Sumpio
Vogel
Widatalla
Wild
Publication venue: 'Elsevier BV'
Publication date: 14/04/2015
Field of study

Background: Readmission rate is an established health quality indicator. Preventable readmissions bear an unnecessary, high cost on the healthcare system. An analysis performed by the National Centre for Health Outcomes Development (NCHOD) has demonstrated an increasing trend in emergency readmissions in the UK. Vascular surgery has been reported to have high readmission rates second only to congestive heart failure. This study aims to identify diagnoses and other clinical risk factors for high unplanned readmission rates. This may be the first step to sparing both the health care system and patients of unnecessary readmissions. Results: The overall 30 day readmission rate for Leeds Vascular Institute was 8.8%. The two diagnoses with the highest readmission rates were lower limb ischaemia and diabetic foot sepsis. The readmission rate for medical reasons was overwhelmingly higher than for surgical reasons (6.5% and 2.3% respectively). The most common medical diagnoses were renal disease and COPD. The majority of the patients readmitted under the care of vascular surgery required further surgical treatment. Conclusion: Vascular units should focus on holistic and multidisciplinary treatment of lower limb ischaemia and diabetic foot sepsis, in order to prevent readmissions. Furthermore, the early involvement and input of physicians in the treatment of vascular patients with renal disease and COPD may be appropriate

Crossref

Directory of Open Access Journals

PubMed Central

White Rose Research Online

Population Physiology: Leveraging Electronic Health Record Data to Understand Human Endocrine Dynamics

Author: A Lorenc
A Makroglou
B Kovatchev
BT Karsh
C Friedman
D Albers
D Blumenthal
D. J. Albers
DA Lang
DJ Albers
DJ Albers
E Moghissi
E Peschke
EW Inscho
G Hripcsak
G Meyfroidt
George Hripcsak
H Sagreiya
Indra Neil Sarkar
J Katz
J Keller
J Keller
J Schmidt
J Sturis
J Suarez
JM Higgins
K Aihara
M Huising
Michael Schmidt
P Blanco
P Fabietti
P Fabietti
PB P
SA Levin
Y Hirata
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Studying physiology and pathophysiology over a broad population for long periods of time is difficult primarily because collecting human physiologic data can be intrusive, dangerous, and expensive. One solution is to use data that have been collected for a different purpose. Electronic health record (EHR) data promise to support the development and testing of mechanistic physiologic models on diverse populations and allow correlation with clinical outcomes, but limitations in the data have thus far thwarted such use. For example, using uncontrolled population-scale EHR data to verify the outcome of time dependent behavior of mechanistic, constructive models can be difficult because: (i) aggregation of the population can obscure or generate a signal, (ii) there is often no control population with a well understood health state, and (iii) diversity in how the population is measured can make the data difficult to fit into conventional analysis techniques. This paper shows that it is possible to use EHR data to test a physiological model for a population and over long time scales. Specifically, a methodology is developed and demonstrated for testing a mechanistic, time-dependent, physiological model of serum glucose dynamics with uncontrolled, population-scale, physiological patient data extracted from an EHR repository. It is shown that there is no observable daily variation the normalized mean glucose for any EHR subpopulations. In contrast, a derived value, daily variation in nonlinear correlation quantified by the time-delayed mutual information (TDMI), did reveal the intuitively expected diurnal variation in glucose levels amongst a random population of humans. Moreover, in a population of continuously (tube) fed patients, there was no observable TDMI-based diurnal signal. These TDMI-based signals, via a glucose insulin model, were then connected with human feeding patterns. In particular, a constructive physiological model was shown to correctly predict the difference between the general uncontrolled population and a subpopulation whose feeding was controlled

Public Library of Science (PLOS)

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Recommended from our members

Pathway analysis of genome-wide data improves warfarin dose prediction

Author: Altman Russ B.
Bourgeois Stephane
Burmester James K.
Cavallari Larisa H.
Daneshjou Roxana
Drozda Katarzyna
Johnson Julie A.
Karczewski Konrad J.
Klein Teri E.
Kubo Michiaki
Limdi Nita A.
Nakamura Yusuke
Perera Minoli
Sagreiya Hersh
Tatonetti Nicholas P.
Tector Matthew
Tsunoda Tatsuhiko
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Background: Many genome-wide association studies focus on associating single loci with target phenotypes. However, in the setting of rare variation, accumulating sufficient samples to assess these associations can be difficult. Moreover, multiple variations in a gene or a set of genes within a pathway may all contribute to the phenotype, suggesting that the aggregation of variations found over the gene or pathway may be useful for improving the power to detect associations. Results: Here, we present a method for aggregating single nucleotide polymorphisms (SNPs) along biologically relevant pathways in order to seek genetic associations with phenotypes. Our method uses all available genetic variants and does not remove those in linkage disequilibrium (LD). Instead, it uses a novel SNP weighting scheme to down-weight the contributions of correlated SNPs. We apply our method to three cohorts of patients taking warfarin: two European descent cohorts and an African American cohort. Although the clinical covariates and key pharmacogenetic loci for warfarin have been characterized, our association metric identifies a significant association with mutations distributed throughout the pathway of warfarin metabolism. We improve dose prediction after using all known clinical covariates and pharmacogenetic variants in VKORC1 and CYP2C9. In particular, we find that at least 1% of the missing heritability in warfarin dose may be due to the aggregated effects of variations in the warfarin metabolic pathway, even though the SNPs do not individually show a significant association. Conclusions: Our method allows researchers to study aggregative SNP effects in an unbiased manner by not preselecting SNPs. It retains all the available information by accounting for LD-structure through weighting, which eliminates the need for LD pruning

Columbia University Academic Commons

PubMed Central

University of Illinois at Chicago: UIC INDIGO (INtellectual property in DIGital form available online in an Open environment)

Clinical Practice Recommendations on Genetic Testing of CYP2C9 and VKORC1 Variants in Warfarin Therapy

Author: Altman
Amstutz
Ansell
Ansell
Avery
Beyth
Biss
Biss
Brouwers
Budnitz
Caldwell
Cavallari
Cavallari
Da Silva
Dandrea
Fang
Finkelman
Frueh
Gage
Gong
Hamberg
Hamberg
Harrington
Heimark
Herman
Hirsh
Hylek
Kaminsky
Kersey
Kim
Kim
Kimmel
King
Klein
Lee
Liang
Limdi
Linkins
Loebstein
Lubetsky
Michaud
Moreau
Namazi
Nguyen
Ozer
Ozer
Palacio
Perera
Perini
Pirmohamed
Ratain
Ridker
Rieder
Rost
Sagreiya
Schelleman
Schwarz
Schwarz
Sconce
Scott
Shaw
Shrif
Singh
Sistonen
Takahashi
Vear
Wallin
Wells
Wysowski
Yang
Zhang
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date
Field of study

Crossref

Predictive Model for ICU Readmission Based on Discharge Summaries Using Machine Learning and Natural Language Processing

Author: Alireza Akhbardeh
Hersh Sagreiya
Negar Orangi-Fard
Publication venue: 'MDPI AG'
Publication date: 26/01/2022
Field of study

Predicting ICU readmission risk will help physicians make decisions regarding discharge. We used discharge summaries to predict ICU 30-day readmission risk using text mining and machine learning (ML) with data from the Medical Information Mart for Intensive Care III (MIMIC-III). We used Natural Language Processing (NLP) and the Bag-of-Words approach on discharge summaries to build a Document-Term-Matrix with 3000 features. We compared the performance of support vector machines with the radial basis function kernel (SVM-RBF), adaptive boosting (AdaBoost), quadratic discriminant analysis (QDA), least absolute shrinkage and selection operator (LASSO), and Ridge Regression. A total of 4000 patients were used for model training and 6000 were used for validation. Using the bag-of-words determined by NLP, the area under the receiver operating characteristic (AUROC) curve was 0.71, 0.68, 0.65, 0.69, and 0.65 correspondingly for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. We then used the SVM-RBF model for feature selection by incrementally adding features to the model from 1 to 3000 bag-of-words. Through this exhaustive search approach, only 825 features (words) were dominant. Using those selected features, we trained and validated all ML models. The AUROC curve was 0.74, 0.69, 0.67, 0.70, and 0.71 respectively for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. Overall, this technique could predict ICU readmission relatively well

Multidisciplinary Digital Publishing Institute

Automated Lung Ultrasound Pulmonary Disease Quantification Using an Unsupervised Machine Learning Technique for COVID-19

Author: Alireza Akhbardeh
Hersh Sagreiya
Michael A. Jacobs
Publication venue: MDPI AG
Publication date: 01/08/2023
Field of study

COVID-19 is an ongoing global health pandemic. Although COVID-19 can be diagnosed with various tests such as PCR, these tests do not establish pulmonary disease burden. Whereas point-of-care lung ultrasound (POCUS) can directly assess the severity of characteristic pulmonary findings of COVID-19, the advantage of using US is that it is inexpensive, portable, and widely available for use in many clinical settings. For automated assessment of pulmonary findings, we have developed an unsupervised learning technique termed the calculated lung ultrasound (CLU) index. The CLU can quantify various types of lung findings, such as A or B lines, consolidations, and pleural effusions, and it uses these findings to calculate a CLU index score, which is a quantitative measure of pulmonary disease burden. This is accomplished using an unsupervised, patient-specific approach that does not require training on a large dataset. The CLU was tested on 52 lung ultrasound examinations from several institutions. CLU demonstrated excellent concordance with radiologist findings in different pulmonary disease states. Given the global nature of COVID-19, the CLU would be useful for sonographers and physicians in resource-strapped areas with limited ultrasound training and diagnostic capacities for more accurate assessment of pulmonary status

Directory of Open Access Journals