80 research outputs found

    A Short Review of Ethical Challenges in Clinical Natural Language Processing

    Full text link
    Clinical NLP has an immense potential in contributing to how clinical practice will be revolutionized by the advent of large scale processing of clinical records. However, this potential has remained largely untapped due to slow progress primarily caused by strict data access policies for researchers. In this paper, we discuss the concern for privacy and the measures it entails. We also suggest sources of less sensitive data. Finally, we draw attention to biases that can compromise the validity of empirical research and lead to socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17

    Machine learning model for clinical named entity recognition

    Get PDF
    To extract important concepts (named entities) from clinical notes, most widely used NLP task is named entity recognition (NER). It is found from the literature that several researchers have extensively used machine learning models for clinical NER.The most fundamental tasks among the medical data mining tasks are medical named entity recognition and normalization. Medical named entity recognition is different from general NER in various ways. Huge number of alternate spellings and synonyms create explosion of word vocabulary sizes. This reduces the medicine dictionary efficiency. Entities often consist of long sequences of tokens, making harder to detect boundaries exactly. The notes written by clinicians written notes are less structured and are in minimal grammatical form with cryptic short hand. Because of this, it poses challenges in named entity recognition. Generally, NER systems are either rule based or pattern based. The rules and patterns are not generalizable because of the diverse writing style of clinicians. The systems that use machine learning based approach to resolve these issues focus on choosing effective features for classifier building. In this work, machine learning based approach has been used to extract the clinical data in a required manne

    How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?

    Full text link
    Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.Comment: AMIA TBI 2014, 6 page

    Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression

    Full text link
    This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The comparison of the RE methods with the gold standard showed that as the complexity increased in the RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation. This study provided an example of the benefit of combining NER methods and NLP models in extracting target information from free-text to structured data and fulfilling the need for missing diagnoses from unstructured notes.Comment: IEEE ICHI 2023, see https://ieeeichi.github.io/ICHI2023/program.htm

    Clinical Text Classification with Rule-based Features and Knowledge-guided Convolutional Neural Networks

    Full text link
    Clinical text classification is an important problem in medical natural language processing. Existing studies have conventionally focused on rules or knowledge sources-based feature engineering, but only a few have exploited effective feature learning capability of deep learning methods. In this study, we propose a novel approach which combines rule-based features and knowledge-guided deep learning techniques for effective disease classification. Critical Steps of our method include identifying trigger phrases, predicting classes with very few examples using trigger phrases and training a convolutional neural network with word embeddings and Unified Medical Language System (UMLS) entity embeddings. We evaluated our method on the 2008 Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. The results show that our method outperforms the state of the art methods.Comment: arXiv admin note: text overlap with arXiv:1806.04820 by other author

    Big data driven co-occurring evidence discovery in chronic obstructive pulmonary disease patients

    Full text link
    © 2017, The Author(s). Background: Chronic Obstructive Pulmonary Disease (COPD) is a chronic lung disease that affects airflow to the lungs. Discovering the co-occurrence of COPD with other diseases, symptoms, and medications is invaluable to medical staff. Building co-occurrence indexes and finding causal relationships with COPD can be difficult because often times disease prevalence within a population influences results. A method which can better separate occurrence within COPD patients from population prevalence would be desirable. Large hospital systems may potentially have tens of millions of patient records spanning decades of collection and a big data approach that is scalable is desirable. The presented method, Co-Occurring Evidence Discovery (COED), presents a methodology and framework to address these issues. Methods: Natural Language Processing methods are used to examine 64,371 deidentified clinical notes and discover associations between COPD and medical terms. Apache cTAKES is leveraged to annotate and structure clinical notes. Several extensions to cTAKES have been written to parallelize the annotation of large sets of clinical notes. A co-occurrence score is presented which can penalize scores based on term prevalence, as well as a baseline method traditionally used for finding co-occurrence. These scoring systems are implemented using Apache Spark. Dictionaries of ground truth terms for diseases, medications, and symptoms have been created using clinical domain knowledge. COED and baseline methods are compared using precision, recall, and F1 score. Results: The highest scoring diseases using COED are lung and respiratory diseases. In contrast, baseline methods for co-occurrence rank diseases with high population prevalence highest. Medications and symptoms evaluated with COED share similar results. When evaluated against ground truth dictionaries, the maximum improvements in recall for symptoms, diseases, and medications were 0.212, 0.130, and 0.174. The maximum improvements in precision for symptoms, diseases, and medications were 0.303, 0.333, and 0.180. Median increase in F1 score for symptoms, diseases, and medications were 38.1%, 23.0%, and 17.1%. A paired t-test was performed and F1 score increases were found to be statistically significant, where p < 0.01. Conclusion: Penalizing terms which are highly frequent in the corpus results in better precision and recall performance. Penalizing frequently occurring terms gives a better picture of the diseases, symptoms, and medications co-occurring with COPD. Using a mathematical and computational approach rather than purely expert driven approach, large dictionaries of COPD related terms can be assembled in a short amount of time
    • …
    corecore