1,207 research outputs found

    MGCN: Medical Relation Extraction Based on GCN

    Get PDF
    With the progress of society and the improvement of living standards, people pay more and more attention to personal health, and WITMED (Wise Information Technology of med) has occupied an important position. The relationship prediction work in the medical field has high requirements on the interpretability of the method, but the relationship between medical entities is complex, and the existing methods are difficult to meet the requirements. This paper proposes a novel medical information relation extraction method MGCN, which combines contextual information to provide global interpretability for relation prediction of medical entities. The method uses Co-occurrence Graph and Graph Convolutional Network to build up a network of relations between entities, uses the Open-world Assumption to construct potential relations between associated entities, and goes through the Knowledge-aware Attention mechanism to give relation prediction for the entity pair of interest. Experiments were conducted on a public medical dataset CTF, MGCN achieved the score of 0.831, demonstrating its effectiveness in medical relation extraction

    Named Entity Recognition in Chinese Clinical Text

    Get PDF
    Objective: Named entity recognition (NER) is one of the fundamental tasks in natural language processing (NLP). In the medical domain, there have been a number of studies on NER in English clinical notes; however, very limited NER research has been done on clinical notes written in Chinese. The goal of this study is to develop corpora, methods, and systems for NER in Chinese clinical text. Materials and methods: To study entities in Chinese clinical text, we started with building annotated clinical corpora in Chinese. We developed an NER annotation guideline in Chinese by extending the one used in the 2010 i2b2 NLP challenge. We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital (PUMCH) in China. For each note, four types of entities including clinical problems, procedures, labs, and medications were annotated according to the developed guideline. In addition, an annotation tool was developed to assist two MD students to annotate Chinese clinical documents. A comparison of entity distribution between Chinese and English clinical notes (646 English and 400 Chinese discharge summaries) was performed using the annotated corpora, to identify the important features for NER. In the NER study, two-thirds of the 400 notes were used for training the NER systems and one-third were used for testing. We investigated the effects of different types of features including bag-of-characters, word segmentation, part-of-speech, and section information, with different machine learning (ML) algorithms including Conditional Random Fields (CRF), Support Vector Machines (SVM), Maximum Entropy (ME), and Structural Support Vector Machines (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset, evaluated on the test set, and microaveraged precision, recall, and F-measure were reported. Results: Our evaluation on the independent test set showed that most types of features were beneficial to Chinese NER systems, although the improvements were limited. By combining word segmentation and section information, the system achieved the highest performance, indicating that these two types of features are complementary to each other. When the same types of optimized features were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM reached the highest performance among the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries respectively. Conclusions: In this study, we created large annotated datasets of Chinese admission notes and discharge summaries and then systematically evaluated different types of features (e.g., syntactic, semantic, and segmentation information) and four ML algorithms including CRF, SVM, SSVM, and ME for clinical NER in Chinese. To the best of our knowledge, this is one of the earliest comprehensive effort in Chinese clinical NER research and we believe it will provide valuable insights to NLP research in Chinese clinical text. Our results suggest that both word segmentation and section information improves NER in Chinese clinical text, and SSVM, a recent sequential labelling algorithm, outperformed CRF and other classification algorithms. Our best system achieved F-measures of 90.01% and 93.52% on Chinese discharge summaries and admission notes, respectively, indicating a promising start on Chinese NLP research

    A Named Entity Recognition System Applied to Arabic Text in the Medical Domain

    Get PDF
    Currently, 30-35% of the global population uses the Internet. Furthermore, there is a rapidly increasing number of non-English language internet users, accompanied by an also increasing amount of unstructured text online. One area replete with underexploited online text is the Arabic medical domain, and one method that can be used to extract valuable data from Arabic medical texts is Named Entity Recognition (NER). NER is the process by which a system can automatically detect and categorise Named Entities (NE). NER has numerous applications in many domains, and medical texts are no exception. NER applied to the medical domain could assist in detection of patterns in medical records, allowing doctors to make better diagnoses and treatment decisions, enabling medical staff to quickly assess a patient's records and ensuring that patients are informed about their data, as just a few examples. However, all these applications would require a very high level of accuracy. To improve the accuracy of NER in this domain, new approaches need to be developed that are tailored to the types of named entities to be extracted and categorised. In an effort to solve this problem, this research applied Bayesian Belief Networks (BBN) to the process. BBN, a probabilistic model for prediction of random variables and their dependencies, can be used to detect and predict entities. The aim of this research is to apply BBN to the NER task to extract relevant medical entities such as disease names, symptoms, treatment methods, and diagnosis methods from modern Arabic texts in the medical domain. To achieve this aim, a new corpus related to the medical domain has been built and annotated. Our BBN approach achieved a 96.60% precision, 90.79% recall, and 93.60% F-measure for the disease entity, while for the treatment method entity, it achieved 69.33%, 70.99%, and 70.15% for precision, recall, and F-measure, respectively. For the diagnosis method and symptom categories, our system achieved 84.91% and 71.34%, respectively, for precision, 53.36% and 49.34%, respectively, for recall, and 65.53% and 58.33%, for F-measure, respectively. Our BBN strategy achieved good accuracy for NEs in the categories of disease and treatment method. However, the average word length of the other two NE categories observed, diagnosis method and symptom, may have had a negative effect on their accuracy. Overall, the application of BBN to Arabic medical NER is successful, but more development is needed to improve accuracy to a standard at which the results can be applied to real medical systems

    Enhancing Drug Overdose Mortality Surveillance through Natural Language Processing and Machine Learning

    Get PDF
    Epidemiological surveillance is key to monitoring and assessing the health of populations. Drug overdose surveillance has become an increasingly important part of public health practice as overdose morbidity and mortality has increased due in large part to the opioid crisis. Monitoring drug overdose mortality relies on death certificate data, which has several limitations including timeliness and the coding structure used to identify specific substances that caused death. These limitations stem from the need to analyze the free-text cause-of-death sections of the death certificate that are completed by the medical certifier during death investigation. Other fields, including clinical sciences, have utilized natural language processing (NLP) methods to gain insight from free-text data, but thus far, adoption of NLP methods in epidemiological surveillance has been limited. Through a narrative review of NLP methods currently used in public health surveillance and the integration of two NLP tasks, classification and named entity recognition, this dissertation enhances the capabilities of public health practitioners and researchers to perform drug overdose mortality surveillance. This dissertation advances both surveillance science and public health practice by integrating methods from bioinformatics into the surveillance pipeline which provides more timely and increased quality overdose mortality surveillance, which is essential to guiding effective public health response to the continuing drug overdose epidemic

    A Review of Reinforcement Learning for Natural Language Processing, and Applications in Healthcare

    Full text link
    Reinforcement learning (RL) has emerged as a powerful approach for tackling complex medical decision-making problems such as treatment planning, personalized medicine, and optimizing the scheduling of surgeries and appointments. It has gained significant attention in the field of Natural Language Processing (NLP) due to its ability to learn optimal strategies for tasks such as dialogue systems, machine translation, and question-answering. This paper presents a review of the RL techniques in NLP, highlighting key advancements, challenges, and applications in healthcare. The review begins by visualizing a roadmap of machine learning and its applications in healthcare. And then it explores the integration of RL with NLP tasks. We examined dialogue systems where RL enables the learning of conversational strategies, RL-based machine translation models, question-answering systems, text summarization, and information extraction. Additionally, ethical considerations and biases in RL-NLP systems are addressed

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Biomedical Question Answering: A Survey of Approaches and Challenges

    Full text link
    Automatic Question Answering (QA) has been successfully applied in various domains such as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables innovative applications to effectively perceive, access and understand complex biomedical knowledge. There have been tremendous developments of BQA in the past two decades, which we classify into 5 distinctive approaches: classic, information retrieval, machine reading comprehension, knowledge base and question entailment approaches. In this survey, we introduce available datasets and representative methods of each BQA approach in detail. Despite the developments, BQA systems are still immature and rarely used in real-life settings. We identify and characterize several key challenges in BQA that might lead to this issue, and discuss some potential future directions to explore.Comment: In submission to ACM Computing Survey

    Automatic text filtering using limited supervision learning for epidemic intelligence

    Get PDF
    [no abstract
    • …