Search CORE

25 research outputs found

The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables

Author: Altman Russ B.
Sagreiya Hersh
Publication venue: Elsevier Inc.
Publication date: 01/10/2010
Field of study

AbstractThere is debate about the utility of clinical data warehouses for research. Using a clinical warfarin dosing algorithm derived from research-quality data, we evaluated the data quality of both a general-purpose database and a coagulation-specific database. We evaluated the functional utility of these repositories by using data extracted from them to predict warfarin dose. We reasoned that high-quality clinical data would predict doses nearly as accurately as research data, while poor-quality clinical data would predict doses less accurately. We evaluated the Mean Absolute Error (MAE) in predicted weekly dose as a metric of data quality. The MAE was comparable between the clinical gold standard (10.1mg/wk) and the specialty database (10.4mg/wk), but the MAE for the clinical warehouse was 40% greater (14.1mg/wk). Our results indicate that the research utility of clinical data collected in focused clinical settings is greater than that of data collected during general-purpose clinical care

Elsevier - Publisher Connector

PubMed Central

Automated Lung Ultrasound Pulmonary Disease Quantification Using an Unsupervised Machine Learning Technique for COVID-19

Author: Akhbardeh Alireza
Jacobs Michael A
Sagreiya Hersh
Publication venue: DigitalCommons@TMC
Publication date: 16/08/2023
Field of study

COVID-19 is an ongoing global health pandemic. Although COVID-19 can be diagnosed with various tests such as PCR, these tests do not establish pulmonary disease burden. Whereas point-of-care lung ultrasound (POCUS) can directly assess the severity of characteristic pulmonary findings of COVID-19, the advantage of using US is that it is inexpensive, portable, and widely available for use in many clinical settings. For automated assessment of pulmonary findings, we have developed an unsupervised learning technique termed the calculated lung ultrasound (CLU) index. The CLU can quantify various types of lung findings, such as A or B lines, consolidations, and pleural effusions, and it uses these findings to calculate a CLU index score, which is a quantitative measure of pulmonary disease burden. This is accomplished using an unsupervised, patient-specific approach that does not require training on a large dataset. The CLU was tested on 52 lung ultrasound examinations from several institutions. CLU demonstrated excellent concordance with radiologist findings in different pulmonary disease states. Given the global nature of COVID-19, the CLU would be useful for sonographers and physicians in resource-strapped areas with limited ultrasound training and diagnostic capacities for more accurate assessment of pulmonary status

DigitalCommons@The Texas Medical Center

An integrative method for scoring candidate genes from association studies: application to warfarin dosing

Author: Altman Russ B
Butte Atul J
Dudley Joel T
Sagreiya Hersh
Tatonetti Nicholas P
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

BackgroundA key challenge in pharmacogenomics is the identification of genes whose variants contribute to drug response phenotypes, which can include severe adverse effects. Pharmacogenomics GWAS attempt to elucidate genotypes predictive of drug response. However, the size of these studies has severely limited their power and potential application. We propose a novel knowledge integration and SNP aggregation approach for identifying genes impacting drug response. Our SNP aggregation method characterizes the degree to which uncommon alleles of a gene are associated with drug response. We first use pre-existing knowledge sources to rank pharmacogenes by their likelihood to affect drug response. We then define a summary score for each gene based on allele frequencies and train linear and logistic regression classifiers to predict drug response phenotypes.ResultsWe applied our method to a published warfarin GWAS data set comprising 181 individuals. We find that our method can increase the power of the GWAS to identify both VKORC1 and CYP2C9 as warfarin pharmacogenes, where the original analysis had only identified VKORC1. Additionally, we find that our method can be used to discriminate between low-dose (AUROC=0.886) and high-dose (AUROC=0.764) responders.ConclusionsOur method offers a new route for candidate pharmacogene discovery from pharmacogenomics GWAS, and serves as a foundation for future work in methods for predictive pharmacogenomics

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Author: Dunnmon Jared
Goldman Roger
Khandwala Nishith
Lee-Messer Christopher
Lungren Matthew
Markert Matthew
Ratner Alexander
Rubin Daniel
Ré Christopher
Saab Khaled
Sagreiya Hersh
Publication venue
Publication date: 26/03/2019
Field of study

Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Pathway analysis of genome-wide data improves warfarin dose prediction

Author: Altman Russ B.
Bourgeois Stephane
Burmester James K.
Cavallari Larisa H.
Daneshjou Roxana
Drozda Katarzyna
Johnson Julie A.
Karczewski Konrad J.
Klein Teri E.
Kubo Michiaki
Limdi Nita A.
Nakamura Yusuke
Perera Minoli
Sagreiya Hersh
Tatonetti Nicholas P.
Tector Matthew
Tsunoda Tatsuhiko
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Background: Many genome-wide association studies focus on associating single loci with target phenotypes. However, in the setting of rare variation, accumulating sufficient samples to assess these associations can be difficult. Moreover, multiple variations in a gene or a set of genes within a pathway may all contribute to the phenotype, suggesting that the aggregation of variations found over the gene or pathway may be useful for improving the power to detect associations. Results: Here, we present a method for aggregating single nucleotide polymorphisms (SNPs) along biologically relevant pathways in order to seek genetic associations with phenotypes. Our method uses all available genetic variants and does not remove those in linkage disequilibrium (LD). Instead, it uses a novel SNP weighting scheme to down-weight the contributions of correlated SNPs. We apply our method to three cohorts of patients taking warfarin: two European descent cohorts and an African American cohort. Although the clinical covariates and key pharmacogenetic loci for warfarin have been characterized, our association metric identifies a significant association with mutations distributed throughout the pathway of warfarin metabolism. We improve dose prediction after using all known clinical covariates and pharmacogenetic variants in VKORC1 and CYP2C9. In particular, we find that at least 1% of the missing heritability in warfarin dose may be due to the aggregated effects of variations in the warfarin metabolic pathway, even though the SNPs do not individually show a significant association. Conclusions: Our method allows researchers to study aggregative SNP effects in an unbiased manner by not preselecting SNPs. It retains all the available information by accounting for LD-structure through weighting, which eliminates the need for LD pruning

Columbia University Academic Commons

PubMed Central

University of Illinois at Chicago: UIC INDIGO (INtellectual property in DIGital form available online in an Open environment)

Predictive Model for ICU Readmission Based on Discharge Summaries Using Machine Learning and Natural Language Processing

Author: Alireza Akhbardeh
Hersh Sagreiya
Negar Orangi-Fard
Publication venue: 'MDPI AG'
Publication date: 26/01/2022
Field of study

Predicting ICU readmission risk will help physicians make decisions regarding discharge. We used discharge summaries to predict ICU 30-day readmission risk using text mining and machine learning (ML) with data from the Medical Information Mart for Intensive Care III (MIMIC-III). We used Natural Language Processing (NLP) and the Bag-of-Words approach on discharge summaries to build a Document-Term-Matrix with 3000 features. We compared the performance of support vector machines with the radial basis function kernel (SVM-RBF), adaptive boosting (AdaBoost), quadratic discriminant analysis (QDA), least absolute shrinkage and selection operator (LASSO), and Ridge Regression. A total of 4000 patients were used for model training and 6000 were used for validation. Using the bag-of-words determined by NLP, the area under the receiver operating characteristic (AUROC) curve was 0.71, 0.68, 0.65, 0.69, and 0.65 correspondingly for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. We then used the SVM-RBF model for feature selection by incrementally adding features to the model from 1 to 3000 bag-of-words. Through this exhaustive search approach, only 825 features (words) were dominant. Using those selected features, we trained and validated all ML models. The AUROC curve was 0.74, 0.69, 0.67, 0.70, and 0.71 respectively for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. Overall, this technique could predict ICU readmission relatively well

Multidisciplinary Digital Publishing Institute

Automated Lung Ultrasound Pulmonary Disease Quantification Using an Unsupervised Machine Learning Technique for COVID-19

Author: Alireza Akhbardeh
Hersh Sagreiya
Michael A. Jacobs
Publication venue: MDPI AG
Publication date: 01/08/2023
Field of study

Directory of Open Access Journals

Predictive Model for ICU Readmission Based on Discharge Summaries Using Machine Learning and Natural Language Processing

Author: Alireza Akhbardeh
Hersh Sagreiya
Negar Orangi-Fard
Publication venue: MDPI AG
Publication date: 01/01/2022
Field of study

Directory of Open Access Journals

The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables

Author: Budnitz
Butte
Cunningham
Elias
Hersh Sagreiya
Klein
Mailman
Manolio
Rieder
Russ B. Altman
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

The double-anchoring theory of lightness perception: A comment on Bressan (2006).

Author: Chengjie Zheng
Dwight L. Curtis
Hersh Sagreiya
Margaret S. Livingstone
Piers D. L. Howe
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2007
Field of study

Crossref