25 research outputs found

    The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables

    Get PDF
    AbstractThere is debate about the utility of clinical data warehouses for research. Using a clinical warfarin dosing algorithm derived from research-quality data, we evaluated the data quality of both a general-purpose database and a coagulation-specific database. We evaluated the functional utility of these repositories by using data extracted from them to predict warfarin dose. We reasoned that high-quality clinical data would predict doses nearly as accurately as research data, while poor-quality clinical data would predict doses less accurately. We evaluated the Mean Absolute Error (MAE) in predicted weekly dose as a metric of data quality. The MAE was comparable between the clinical gold standard (10.1mg/wk) and the specialty database (10.4mg/wk), but the MAE for the clinical warehouse was 40% greater (14.1mg/wk). Our results indicate that the research utility of clinical data collected in focused clinical settings is greater than that of data collected during general-purpose clinical care

    Automated Lung Ultrasound Pulmonary Disease Quantification Using an Unsupervised Machine Learning Technique for COVID-19

    Get PDF
    COVID-19 is an ongoing global health pandemic. Although COVID-19 can be diagnosed with various tests such as PCR, these tests do not establish pulmonary disease burden. Whereas point-of-care lung ultrasound (POCUS) can directly assess the severity of characteristic pulmonary findings of COVID-19, the advantage of using US is that it is inexpensive, portable, and widely available for use in many clinical settings. For automated assessment of pulmonary findings, we have developed an unsupervised learning technique termed the calculated lung ultrasound (CLU) index. The CLU can quantify various types of lung findings, such as A or B lines, consolidations, and pleural effusions, and it uses these findings to calculate a CLU index score, which is a quantitative measure of pulmonary disease burden. This is accomplished using an unsupervised, patient-specific approach that does not require training on a large dataset. The CLU was tested on 52 lung ultrasound examinations from several institutions. CLU demonstrated excellent concordance with radiologist findings in different pulmonary disease states. Given the global nature of COVID-19, the CLU would be useful for sonographers and physicians in resource-strapped areas with limited ultrasound training and diagnostic capacities for more accurate assessment of pulmonary status

    An integrative method for scoring candidate genes from association studies: application to warfarin dosing

    Get PDF
    BackgroundA key challenge in pharmacogenomics is the identification of genes whose variants contribute to drug response phenotypes, which can include severe adverse effects. Pharmacogenomics GWAS attempt to elucidate genotypes predictive of drug response. However, the size of these studies has severely limited their power and potential application. We propose a novel knowledge integration and SNP aggregation approach for identifying genes impacting drug response. Our SNP aggregation method characterizes the degree to which uncommon alleles of a gene are associated with drug response. We first use pre-existing knowledge sources to rank pharmacogenes by their likelihood to affect drug response. We then define a summary score for each gene based on allele frequencies and train linear and logistic regression classifiers to predict drug response phenotypes.ResultsWe applied our method to a published warfarin GWAS data set comprising 181 individuals. We find that our method can increase the power of the GWAS to identify both VKORC1 and CYP2C9 as warfarin pharmacogenes, where the original analysis had only identified VKORC1. Additionally, we find that our method can be used to discriminate between low-dose (AUROC=0.886) and high-dose (AUROC=0.764) responders.ConclusionsOur method offers a new route for candidate pharmacogene discovery from pharmacogenomics GWAS, and serves as a foundation for future work in methods for predictive pharmacogenomics

    Cross-Modal Data Programming Enables Rapid Medical Machine Learning

    Full text link
    Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine

    Predictive Model for ICU Readmission Based on Discharge Summaries Using Machine Learning and Natural Language Processing

    No full text
    Predicting ICU readmission risk will help physicians make decisions regarding discharge. We used discharge summaries to predict ICU 30-day readmission risk using text mining and machine learning (ML) with data from the Medical Information Mart for Intensive Care III (MIMIC-III). We used Natural Language Processing (NLP) and the Bag-of-Words approach on discharge summaries to build a Document-Term-Matrix with 3000 features. We compared the performance of support vector machines with the radial basis function kernel (SVM-RBF), adaptive boosting (AdaBoost), quadratic discriminant analysis (QDA), least absolute shrinkage and selection operator (LASSO), and Ridge Regression. A total of 4000 patients were used for model training and 6000 were used for validation. Using the bag-of-words determined by NLP, the area under the receiver operating characteristic (AUROC) curve was 0.71, 0.68, 0.65, 0.69, and 0.65 correspondingly for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. We then used the SVM-RBF model for feature selection by incrementally adding features to the model from 1 to 3000 bag-of-words. Through this exhaustive search approach, only 825 features (words) were dominant. Using those selected features, we trained and validated all ML models. The AUROC curve was 0.74, 0.69, 0.67, 0.70, and 0.71 respectively for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. Overall, this technique could predict ICU readmission relatively well

    Automated Lung Ultrasound Pulmonary Disease Quantification Using an Unsupervised Machine Learning Technique for COVID-19

    No full text
    COVID-19 is an ongoing global health pandemic. Although COVID-19 can be diagnosed with various tests such as PCR, these tests do not establish pulmonary disease burden. Whereas point-of-care lung ultrasound (POCUS) can directly assess the severity of characteristic pulmonary findings of COVID-19, the advantage of using US is that it is inexpensive, portable, and widely available for use in many clinical settings. For automated assessment of pulmonary findings, we have developed an unsupervised learning technique termed the calculated lung ultrasound (CLU) index. The CLU can quantify various types of lung findings, such as A or B lines, consolidations, and pleural effusions, and it uses these findings to calculate a CLU index score, which is a quantitative measure of pulmonary disease burden. This is accomplished using an unsupervised, patient-specific approach that does not require training on a large dataset. The CLU was tested on 52 lung ultrasound examinations from several institutions. CLU demonstrated excellent concordance with radiologist findings in different pulmonary disease states. Given the global nature of COVID-19, the CLU would be useful for sonographers and physicians in resource-strapped areas with limited ultrasound training and diagnostic capacities for more accurate assessment of pulmonary status

    Predictive Model for ICU Readmission Based on Discharge Summaries Using Machine Learning and Natural Language Processing

    No full text
    Predicting ICU readmission risk will help physicians make decisions regarding discharge. We used discharge summaries to predict ICU 30-day readmission risk using text mining and machine learning (ML) with data from the Medical Information Mart for Intensive Care III (MIMIC-III). We used Natural Language Processing (NLP) and the Bag-of-Words approach on discharge summaries to build a Document-Term-Matrix with 3000 features. We compared the performance of support vector machines with the radial basis function kernel (SVM-RBF), adaptive boosting (AdaBoost), quadratic discriminant analysis (QDA), least absolute shrinkage and selection operator (LASSO), and Ridge Regression. A total of 4000 patients were used for model training and 6000 were used for validation. Using the bag-of-words determined by NLP, the area under the receiver operating characteristic (AUROC) curve was 0.71, 0.68, 0.65, 0.69, and 0.65 correspondingly for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. We then used the SVM-RBF model for feature selection by incrementally adding features to the model from 1 to 3000 bag-of-words. Through this exhaustive search approach, only 825 features (words) were dominant. Using those selected features, we trained and validated all ML models. The AUROC curve was 0.74, 0.69, 0.67, 0.70, and 0.71 respectively for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. Overall, this technique could predict ICU readmission relatively well
    corecore