Search CORE

34 research outputs found

Erratum to: Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature

Author: Dingcheng Li
Hongfang Liu
Jean-Pierre Kocher
Kavishwar B. Wagholikar
KE Ravikumar
Komandur Elayavilli Ravikumar
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

Evolving Research Data Sharing Networks to Clinical App Sharing Networks

Author: Carton Thomas
Colas Ricardo
Jain Rahul
Klann Jeffery
Mandel Joshua
Mandl Kenneth D.
Murphy Shawn N.
Oliveira Eliel
Patil Prasad
Wagholikar Kavishwar B.
Yadav Kuladip
Publication venue: American Medical Informatics Association
Publication date: 21/11/2017
Field of study

Research networks for data sharing are growing into a large platform for pragmatic clinical trials to generate quality evidence for shared medical decision-making. Institutions partnering in the networks have made large investments in developing the infrastructure for sharing data. We investigate whether institutions partnering on Patient-Centered Outcomes Research Institute’s (PCORI) network can share clinical apps. At two different sites, we imported patient data in PCORI’s clinical data model (CDM) format into i2b2 repositories, and adapted the SMART-on-FHIR cell to perform CDM-to-FHIR translation, serving demographics, laboratory results and diagnoses. We performed manual validations and tested the platform using four apps from the SMART app gallery. Our study demonstrates an approach to extend the research infrastructure to allow the partnering institutions to run shared clinical apps, and highlights the involved challenges. Our results, tooling and publically accessible data service can potentially transform research networks into clinical app sharing networks and pave the way towards a learning health system

Harvard University - DASH

Implementation of informatics for integrating biology and the bedside (i2b2) platform as Docker containers.

Author: Wagholikar Kavishwar B,
Publication venue
Publication date: 15/06/2020
Field of study

Ezid

Automating Installation of the Integrating Biology and the Bedside (i2b2) Platform.

Author: Wagholikar Kavishwar B,
Publication venue
Publication date: 15/06/2020
Field of study

Ezid

Pooling annotated corpora for clinical concept extraction

Author: Hongfang Liu
Kavishwar B Wagholikar
Manabu Torii
Siddhartha R Jonnalagadda
Wagholikar Kavishwar B
Publication venue: Springer Berlin
Publication date: 08/01/2013
Field of study

Springer

Springer - Publisher Connector

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

Author: Chueh Henry C
Chueh Henry C.
McCray Alexa T
McCray Alexa T.
Szolovits Peter
Wagholikar Kavishwar B
Wagholikar Kavishwar B.
Weng Wei-Hung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/12/2017
Field of study

Background The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. Methods We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets — clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. Results The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Conclusion Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions. Keywords: Medical Decision Making; Computer-assisted; Natural Language Processing; Unified Medical Language System; Machine Learning; Deep Learning; Distributed Representatio

DSpace@MIT

Recommended from our members

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

Author: Chueh Henry C.
McCray Alexa T.
Szolovits Peter
Wagholikar Kavishwar B.
Weng Wei-Hung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2017
Field of study

Background: The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. Methods: We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets — clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. Results: The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Conclusion: Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions. Electronic supplementary material The online version of this article (10.1186/s12911-017-0556-8) contains supplementary material, which is available to authorized users

Harvard University - DASH

Directory of Open Access Journals

Predicting COVID-19 mortality with electronic medical records

Author: Estiri Hossein
Klann Jeffy G.
Murphy Shawn N.
Naseri Pourandokht
Strasser Zachary H.
Wagholikar Kavishwar B.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This study aims to predict death after COVID-19 using only the past medical information routinely collected in electronic health records (EHRs) and to understand the differences in risk factors across age groups. Combining computational methods and clinical expertise, we curated clusters that represent 46 clinical conditions as potential risk factors for death after a COVID-19 infection. We trained age-stratified generalized linear models (GLMs) with component-wise gradient boosting to predict the probability of death based on what we know from the patients before they contracted the virus. Despite only relying on previously documented demographics and comorbidities, our models demonstrated similar performance to other prognostic models that require an assortment of symptoms, laboratory values, and images at the time of diagnosis or during the course of the illness. In general, we found age as the most important predictor of mortality in COVID-19 patients. A history of pneumonia, which is rarely asked in typical epidemiology studies, was one of the most important risk factors for predicting COVID-19 mortality. A history of diabetes with complications and cancer (breast and prostate) were notable risk factors for patients between the ages of 45 and 65 years. In patients aged 65–85 years, diseases that affect the pulmonary system, including interstitial lung disease, chronic obstructive pulmonary disease, lung cancer, and a smoking history, were important for predicting mortality. The ability to compute precise individual-level risk scores exclusively based on the EHR is crucial for effectively allocating and distributing resources, such as prioritizing vaccination among the general population

Sydney eScholarship