1,300 research outputs found

    Building a semantically annotated corpus of clinical texts

    Get PDF
    In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

    Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Natural Language Processing (NLP) systems can be used for specific Information Extraction (IE) tasks such as extracting phenotypic data from the electronic medical record (EMR). These data are useful for translational research and are often found only in free text clinical notes. A key required step for IE is the manual annotation of clinical corpora and the creation of a reference standard for (1) training and validation tasks and (2) to focus and clarify NLP system requirements. These tasks are time consuming, expensive, and require considerable effort on the part of human reviewers.</p> <p>Methods</p> <p>Using a set of clinical documents from the VA EMR for a particular use case of interest we identify specific challenges and present several opportunities for annotation tasks. We demonstrate specific methods using an open source annotation tool, a customized annotation schema, and a corpus of clinical documents for patients known to have a diagnosis of Inflammatory Bowel Disease (IBD). We report clinician annotator agreement at the document, concept, and concept attribute level. We estimate concept yield in terms of annotated concepts within specific note sections and document types.</p> <p>Results</p> <p>Annotator agreement at the document level for documents that contained concepts of interest for IBD using estimated Kappa statistic (95% CI) was very high at 0.87 (0.82, 0.93). At the concept level, F-measure ranged from 0.61 to 0.83. However, agreement varied greatly at the specific concept attribute level. For this particular use case (IBD), clinical documents producing the highest concept yield per document included GI clinic notes and primary care notes. Within the various types of notes, the highest concept yield was in sections representing patient assessment and history of presenting illness. Ancillary service documents and family history and plan note sections produced the lowest concept yield.</p> <p>Conclusion</p> <p>Challenges include defining and building appropriate annotation schemas, adequately training clinician annotators, and determining the appropriate level of information to be annotated. Opportunities include narrowing the focus of information extraction to use case specific note types and sections, especially in cases where NLP systems will be used to extract information from large repositories of electronic clinical note documents.</p

    Differential Diagnosis Documentation In Emergency Medicine

    Get PDF
    Diagnosis is a central aspect of emergency medicine. Coming to the correct diagnosis impacts patient morbidity and mortality and also the healthcare expenditures. Medical decision making is driven by the path of figuring out the differential diagnosis. Once a decent Natural Language Processing (NLP) system is developed including general characterization of differential diagnose, associated with downstream testing, diagnostic error, etc., we could be able to automatically extract differential diagnoses within clinical notes, which would have a large impact on healthcare. The main purpose of our investigative study is the characterization of differential diagnosis documentation within emergency provider notes and the development of an annotated corpus that could be used for further downstream development of NLP applications. We conducted a retrospective analysis of emergency provider notes to identify, categorize, and extract information around differential diagnoses using manual annotation. We used a light annotation framework within the MATTER cycle and extracted the information from our annotations based on a random sample of 1545 medical records. We describe the demographics information and note that only 18.1% of patients were actually given a differential diagnosis by the physicians. We examined factors including age groups, race and ethnicity groups, language preferred, acuity level, and major complaints that could lead to differences in differential diagnosis rates among patients. Within the differential diagnosis groups, evidence support and probability terms are reported. We also examined cough, chest pain, shortness of breath, abdominal pain, back pain, and falling, which are the top six complaints. Still, we suffered from limitations including sample size, nature of the accuracy of annotations, etc

    Doctor of Philosophy

    Get PDF
    dissertationManual annotation of clinical texts is often used as a method of generating reference standards that provide data for training and evaluation of Natural Language Processing (NLP) systems. Manually annotating clinical texts is time consuming, expensive, and requires considerable cognitive effort on the part of human reviewers. Furthermore, reference standards must be generated in ways that produce consistent and reliable data but must also be valid in order to adequately evaluate the performance of those systems. The amount of labeled data necessary varies depending on the level of analysis, the complexity of the clinical use case, and the methods that will be used to develop automated machine systems for information extraction and classification. Evaluating methods that potentially reduce cost, manual human workload, introduce task efficiencies, and reduce the amount of labeled data necessary to train NLP tools for specific clinical use cases are active areas of research inquiry in the clinical NLP domain. This dissertation integrates a mixed methods approach using methodologies from cognitive science and artificial intelligence with manual annotation of clinical texts. Aim 1 of this dissertation identifies factors that affect manual annotation of clinical texts. These factors are further explored by evaluating approaches that may introduce efficiencies into manual review tasks applied to two different NLP development areas - semantic annotation of clinical concepts and identification of information representing Protected Health Information (PHI) as defined by HIPAA. Both experiments integrate iv different priming mechanisms using noninteractive and machine-assisted methods. The main hypothesis for this research is that integrating pre-annotation or other machineassisted methods within manual annotation workflows will improve efficiency of manual annotation tasks without diminishing the quality of generated reference standards

    Doctor of Philosophy

    Get PDF
    dissertationDomain adaptation of natural language processing systems is challenging because it requires human expertise. While manual e ort is e ective in creating a high quality knowledge base, it is expensive and time consuming. Clinical text adds another layer of complexity to the task due to privacy and con dentiality restrictions that hinder the ability to share training corpora among di erent research groups. Semantic ambiguity is a major barrier for e ective and accurate concept recognition by natural language processing systems. In my research I propose an automated domain adaptation method that utilizes sublanguage semantic schema for all-word word sense disambiguation of clinical narrative. According to the sublanguage theory developed by Zellig Harris, domain-speci c language is characterized by a relatively small set of semantic classes that combine into a small number of sentence types. Previous research relied on manual analysis to create language models that could be used for more e ective natural language processing. Building on previous semantic type disambiguation research, I propose a method of resolving semantic ambiguity utilizing automatically acquired semantic type disambiguation rules applied on clinical text ambiguously mapped to a standard set of concepts. This research aims to provide an automatic method to acquire Sublanguage Semantic Schema (S3) and apply this model to disambiguate terms that map to more than one concept with di erent semantic types. The research is conducted using unmodi ed MetaMap version 2009, a concept recognition system provided by the National Library of Medicine, applied on a large set of clinical text. The project includes creating and comparing models, which are based on unambiguous concept mappings found in seventeen clinical note types. The e ectiveness of the nal application was validated through a manual review of a subset of processed clinical notes using recall, precision and F-score metrics

    The holistic perspective of the INCISIVE Project: artificial intelligence in screening mammography

    Get PDF
    Finding new ways to cost-effectively facilitate population screening and improve cancer diagnoses at an early stage supported by data-driven AI models provides unprecedented opportunities to reduce cancer related mortality. This work presents the INCISIVE project initiative towards enhancing AI solutions for health imaging by unifying, harmonizing, and securely sharing scattered cancer-related data to ensure large datasets which are critically needed to develop and evaluate trustworthy AI models. The adopted solutions of the INCISIVE project have been outlined in terms of data collection, harmonization, data sharing, and federated data storage in compliance with legal, ethical, and FAIR principles. Experiences and examples feature breast cancer data integration and mammography collection, indicating the current progress, challenges, and future directions.This research received funding mainly from the European Union’s Horizon 2020 research and innovation program under grant agreement no 952179. It was also partially funded by the Ministry of Economy, Industry, and Competitiveness of Spain under contracts PID2019-107255GB and 2017-SGR-1414.Peer ReviewedArticle signat per 30 autors/es: Ivan Lazic (1), Ferran Agullo (2), Susanna Ausso (3), Bruno Alves (4), Caroline Barelle (4), Josep Ll. Berral (2), Paschalis Bizopoulos (5), Oana Bunduc (6), Ioanna Chouvarda (7), Didier Dominguez (3), Dimitrios Filos (7), Alberto Gutierrez-Torre (2), Iman Hesso (8), Nikša Jakovljević (1), Reem Kayyali (8), Magdalena Kogut-Czarkowska (9), Alexandra Kosvyra (7), Antonios Lalas (5) , Maria Lavdaniti (10,11), Tatjana Loncar-Turukalo (1),Sara Martinez-Alabart (3), Nassos Michas (4,12), Shereen Nabhani-Gebara (8), Andreas Raptopoulos (6), Yiannis Roussakis (13), Evangelia Stalika (7,11), Chrysostomos Symvoulidis (6,14), Olga Tsave (7), Konstantinos Votis (5) Andreas Charalambous (15) / (1) Faculty of Technical Sciences, University of Novi Sad, 21000 Novi Sad, Serbia; (2) Barcelona Supercomputing Center, 08034 Barcelona, Spain; (3) Fundació TIC Salut Social, Ministry of Health of Catalonia, 08005 Barcelona, Spain; (4) European Dynamics, 1466 Luxembourg, Luxembourg; (5) Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece; (6) Telesto IoT Solutions, London N7 7PX, UK: (7) School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece; (8) Department of Pharmacy, Kingston University London, London KT1 2EE, UK; (9) Timelex BV/SRL, 1000 Brussels, Belgium; (10) Nursing Department, International Hellenic University, 57400 Thessaloniki, Greece; (11) Hellenic Cancer Society, 11521 Athens, Greece; (12) European Dynamics, 15124 Athens, Greece; (13) German Oncology Center, Department of Medical Physics, Limassol 4108, Cyprus; (14) Department of Digital Systems, University of Piraeus, 18534 Piraeus, Greece; (15) Department of Nursing, Cyprus University of Technology, Limassol 3036, CyprusPostprint (published version
    corecore