4,049 research outputs found
Detecting Hypoglycemia Incidents Reported in Patients\u27 Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance
BACKGROUND: Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety.
OBJECTIVE: We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients\u27 secure messages.
METHODS: An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data.
RESULTS: The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect.
CONCLUSIONS: Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia
Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts
We present the construction of an annotated corpus of PubMed abstracts
reporting about positive, negative or neutral effects of treatments or
substances. Our ultimate goal is to annotate one sentence (rationale) for each
abstract and to use this resource as a training set for text classification of
effects discussed in PubMed abstracts. Currently, the corpus consists of 750
abstracts. We describe the automatic processing that supports the corpus
construction, the manual annotation activities and some features of the medical
language in the abstracts selected for the annotated corpus. It turns out that
recognizing the terminology and the abbreviations is key for determining the
rationale sentence. The corpus will be applied to improve our classifier, which
currently has accuracy of 78.80% achieved with normalization of the abstract
terms based on UMLS concepts from specific semantic groups and an SVM with a
linear kernel. Finally, we discuss some other possible applications of this
corpus.Comment: medical relation extraction, rationale extraction, effects and
treatments, bioNL
Application of Biomedical Text Mining
With the enormous volume of biological literature, increasing growth phenomenon due to the high rate of new publications is one of the most common motivations for the biomedical text mining. Aiming at this massive literature to process, it could extract more biological information for mining biomedical knowledge. Using the information will help understand the mechanism of disease generation, promote the development of disease diagnosis technology, and promote the development of new drugs in the field of biomedical research. Based on the background, this chapter introduces the rise of biomedical text mining. Then, it describes the biomedical text-mining technology, namely natural language processing, including the several components. This chapter emphasizes the two aspects in biomedical text mining involving static biomedical information recognization and dynamic biomedical information extraction using instance analysis from our previous works. The aim is to provide a way to quickly understand biomedical text mining for some researchers
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review
Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset
A rule-based approach to embedding techniques for text document classification
With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this study is to classify documents by using a rule-based. This paper deals with the rule-based approach with the embedding technique for a document to vector (doc2vec) files. An experiment was performed on two data sets Reuters-21578 and the 20 Newsgroups to classify the top ten categories of these data sets by using a document to vector rule-based (D2vecRule). Finally, this method provided us a good classification result according to the F-measures and implementation time metrics. In conclusion, it was observed that our algorithm document to vector rule-based (D2vecRule) was good when compared with other algorithms such as JRip, One R, and ZeroR applied to the same Reuters-21578 dataset. Keywords: text classification; rule-based; word embedding; Doc2vec.publishedVersio
Identifying Patients With Hypoglycemia Using Natural Language Processing: Systematic Literature Review.
BACKGROUND: Accurately identifying patients with hypoglycemia is key to preventing adverse events and mortality. Natural language processing (NLP), a form of artificial intelligence, uses computational algorithms to extract information from text data. NLP is a scalable, efficient, and quick method to extract hypoglycemia-related information when using electronic health record data sources from a large population.
OBJECTIVE: The objective of this systematic review was to synthesize the literature on the application of NLP to extract hypoglycemia from electronic health record clinical notes.
METHODS: Literature searches were conducted electronically in PubMed, Web of Science Core Collection, CINAHL (EBSCO), PsycINFO (Ovid), IEEE Xplore, Google Scholar, and ACL Anthology. Keywords included hypoglycemia, low blood glucose, NLP, and machine learning. Inclusion criteria included studies that applied NLP to identify hypoglycemia, reported the outcomes related to hypoglycemia, and were published in English as full papers.
RESULTS: This review (n=8 studies) revealed heterogeneity of the reported results related to hypoglycemia. Of the 8 included studies, 4 (50%) reported that the prevalence rate of any level of hypoglycemia was 3.4% to 46.2%. The use of NLP to analyze clinical notes improved the capture of undocumented or missed hypoglycemic events using International Classification of Diseases, Ninth Revision (ICD-9), and International Classification of Diseases, Tenth Revision (ICD-10), and laboratory testing. The combination of NLP and ICD-9 or ICD-10 codes significantly increased the identification of hypoglycemic events compared with individual methods; for example, the prevalence rates of hypoglycemia were 12.4% for International Classification of Diseases codes, 25.1% for an NLP algorithm, and 32.2% for combined algorithms. All the reviewed studies applied rule-based NLP algorithms to identify hypoglycemia.
CONCLUSIONS: The findings provided evidence that the application of NLP to analyze clinical notes improved the capture of hypoglycemic events, particularly when combined with the ICD-9 or ICD-10 codes and laboratory testing
- …