80 research outputs found
A Short Review of Ethical Challenges in Clinical Natural Language Processing
Clinical NLP has an immense potential in contributing to how clinical
practice will be revolutionized by the advent of large scale processing of
clinical records. However, this potential has remained largely untapped due to
slow progress primarily caused by strict data access policies for researchers.
In this paper, we discuss the concern for privacy and the measures it entails.
We also suggest sources of less sensitive data. Finally, we draw attention to
biases that can compromise the validity of empirical research and lead to
socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17
Machine learning model for clinical named entity recognition
To extract important concepts (named entities) from clinical notes, most widely used NLP task is named entity recognition (NER). It is found from the literature that several researchers have extensively used machine learning models for clinical NER.The most fundamental tasks among the medical data mining tasks are medical named entity recognition and normalization. Medical named entity recognition is different from general NER in various ways. Huge number of alternate spellings and synonyms create explosion of word vocabulary sizes. This reduces the medicine dictionary efficiency. Entities often consist of long sequences of tokens, making harder to detect boundaries exactly. The notes written by clinicians written notes are less structured and are in minimal grammatical form with cryptic short hand. Because of this, it poses challenges in named entity recognition. Generally, NER systems are either rule based or pattern based. The rules and patterns are not generalizable because of the diverse writing style of clinicians. The systems that use machine learning based approach to resolve these issues focus on choosing effective features for classifier building. In this work, machine learning based approach has been used to extract the clinical data in a required manne
How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?
Electronic health records capture patient information using structured
controlled vocabularies and unstructured narrative text. While structured data
typically encodes lab values, encounters and medication lists, unstructured
data captures the physician's interpretation of the patient's condition,
prognosis, and response to therapeutic intervention. In this paper, we
demonstrate that information extraction from unstructured clinical narratives
is essential to most clinical applications. We perform an empirical study to
validate the argument and show that structured data alone is insufficient in
resolving eligibility criteria for recruiting patients onto clinical trials for
chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is
essential to solving 59% of the CLL trial criteria and 77% of the prostate
cancer trial criteria. More specifically, for resolving eligibility criteria
with temporal constraints, we show the need for temporal reasoning and
information integration with medical events within and across unstructured
clinical narratives and structured data.Comment: AMIA TBI 2014, 6 page
Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression
This study aimed to utilize text processing and natural language processing
(NLP) models to mine clinical notes for the diagnosis of periodontitis and to
evaluate the performance of a named entity recognition (NER) model on different
regular expression (RE) methods. Two complexity levels of RE methods were used
to extract and generate the training data. The SpaCy package and RoBERTa
transformer models were used to build the NER model and evaluate its
performance with the manual-labeled gold standards. The comparison of the RE
methods with the gold standard showed that as the complexity increased in the
RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER
models demonstrated excellent predictions, with the simple RE method showing
0.84-0.92 in the evaluation metrics, and the advanced and combined RE method
demonstrating 0.95-0.99 in the evaluation. This study provided an example of
the benefit of combining NER methods and NLP models in extracting target
information from free-text to structured data and fulfilling the need for
missing diagnoses from unstructured notes.Comment: IEEE ICHI 2023, see https://ieeeichi.github.io/ICHI2023/program.htm
Clinical Text Classification with Rule-based Features and Knowledge-guided Convolutional Neural Networks
Clinical text classification is an important problem in medical natural
language processing. Existing studies have conventionally focused on rules or
knowledge sources-based feature engineering, but only a few have exploited
effective feature learning capability of deep learning methods. In this study,
we propose a novel approach which combines rule-based features and
knowledge-guided deep learning techniques for effective disease classification.
Critical Steps of our method include identifying trigger phrases, predicting
classes with very few examples using trigger phrases and training a
convolutional neural network with word embeddings and Unified Medical Language
System (UMLS) entity embeddings. We evaluated our method on the 2008
Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge.
The results show that our method outperforms the state of the art methods.Comment: arXiv admin note: text overlap with arXiv:1806.04820 by other author
Big data driven co-occurring evidence discovery in chronic obstructive pulmonary disease patients
© 2017, The Author(s). Background: Chronic Obstructive Pulmonary Disease (COPD) is a chronic lung disease that affects airflow to the lungs. Discovering the co-occurrence of COPD with other diseases, symptoms, and medications is invaluable to medical staff. Building co-occurrence indexes and finding causal relationships with COPD can be difficult because often times disease prevalence within a population influences results. A method which can better separate occurrence within COPD patients from population prevalence would be desirable. Large hospital systems may potentially have tens of millions of patient records spanning decades of collection and a big data approach that is scalable is desirable. The presented method, Co-Occurring Evidence Discovery (COED), presents a methodology and framework to address these issues. Methods: Natural Language Processing methods are used to examine 64,371 deidentified clinical notes and discover associations between COPD and medical terms. Apache cTAKES is leveraged to annotate and structure clinical notes. Several extensions to cTAKES have been written to parallelize the annotation of large sets of clinical notes. A co-occurrence score is presented which can penalize scores based on term prevalence, as well as a baseline method traditionally used for finding co-occurrence. These scoring systems are implemented using Apache Spark. Dictionaries of ground truth terms for diseases, medications, and symptoms have been created using clinical domain knowledge. COED and baseline methods are compared using precision, recall, and F1 score. Results: The highest scoring diseases using COED are lung and respiratory diseases. In contrast, baseline methods for co-occurrence rank diseases with high population prevalence highest. Medications and symptoms evaluated with COED share similar results. When evaluated against ground truth dictionaries, the maximum improvements in recall for symptoms, diseases, and medications were 0.212, 0.130, and 0.174. The maximum improvements in precision for symptoms, diseases, and medications were 0.303, 0.333, and 0.180. Median increase in F1 score for symptoms, diseases, and medications were 38.1%, 23.0%, and 17.1%. A paired t-test was performed and F1 score increases were found to be statistically significant, where p < 0.01. Conclusion: Penalizing terms which are highly frequent in the corpus results in better precision and recall performance. Penalizing frequently occurring terms gives a better picture of the diseases, symptoms, and medications co-occurring with COPD. Using a mathematical and computational approach rather than purely expert driven approach, large dictionaries of COPD related terms can be assembled in a short amount of time
- …