Search CORE

27 research outputs found

Special issue of BMC medical informatics and decision making on health natural language processing

Author: Vydiswaran V. G V
Wang Yanshan
Xu Hua
Zhang Yaoyun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2019
Field of study

https://deepblue.lib.umich.edu/bitstream/2027.42/148521/1/12911_2019_Article_777.pd

Directory of Open Access Journals

Deep Blue Documents

Recommended from our members

Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification

Author: Hanauer David A
Landis-Lewis Zach
Mei Qiaozhu
Singh Karandeep
Vydiswaran V. G V
Weng Chunhua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Abstract Background Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes. Methods We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed. Results We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients. Conclusions Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks.https://deepblue.lib.umich.edu/bitstream/2027.42/148519/1/12911_2019_Article_784.pd

Columbia University Academic Commons

Directory of Open Access Journals

Deep Blue Documents

Recommended from our members

Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification

Author: Hanauer David A.
Landis-Lewis Zach
Mei Qiaozhu
Singh Karandeep
Vydiswaran V. G. V.
Weng Chunhua
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Background Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes. Methods We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed. Results We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients. Conclusions Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks

Columbia University Academic Commons

Learning to extract information from large domain-specific websites using sequential models

Author: Diligenti Michelangelo
Lafferty John
Rennie Jason
Sunita Sarawagi
V. G. Vinod Vydiswaran
Vinod Vydiswaran V. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

An Assessment of Mentions of Adverse Drug Events on Social Media With Natural Language Processing: Model Development and Analysis

Author: Deahan Yu
V G Vinod Vydiswaran
Publication venue: JMIR Publications
Publication date: 01/09/2022
Field of study

BackgroundAdverse reactions to drugs attract significant concern in both clinical practice and public health monitoring. Multiple measures have been put into place to increase postmarketing surveillance of the adverse effects of drugs and to improve drug safety. These measures include implementing spontaneous reporting systems and developing automated natural language processing systems based on data from electronic health records and social media to collect evidence of adverse drug events that can be further investigated as possible adverse reactions. ObjectiveWhile using social media for collecting evidence of adverse drug events has potential, it is not clear whether social media are a reliable source for this information. Our work aims to (1) develop natural language processing approaches to identify adverse drug events on social media and (2) assess the reliability of social media data to identify adverse drug events. MethodsWe propose a collocated long short-term memory network model with attentive pooling and aggregated, contextual representation generated by a pretrained model. We applied this model on large-scale Twitter data to identify adverse drug event–related tweets. We conducted a qualitative content analysis of these tweets to validate the reliability of social media data as a means to collect such information. ResultsThe model outperformed a variant without contextual representation during both the validation and evaluation phases. Through the content analysis of adverse drug event tweets, we observed that adverse drug event–related discussions had 7 themes. Mental health–related, sleep-related, and pain-related adverse drug event discussions were most frequent. We also contrast known adverse drug reactions to those mentioned in tweets. ConclusionsWe observed a distinct improvement in the model when it used contextual information. However, our results reveal weak generalizability of the current systems to unseen data. Additional research is needed to fully utilize social media data and improve the robustness and reliability of natural language processing systems. The content analysis, on the other hand, showed that Twitter covered a sufficiently wide range of adverse drug events, as well as known adverse reactions, for the drugs mentioned in tweets. Our work demonstrates that social media can be a reliable data source for collecting adverse drug event mentions

Directory of Open Access Journals

PubMed Central