1,419 research outputs found

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Building a semantically annotated corpus of clinical texts

    Get PDF
    In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

    EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria.

    Get PDF
    Effective clinical text processing requires accurate extraction and representation of temporal expressions. Multiple temporal information extraction models were developed but a similar need for extracting temporal expressions in eligibility criteria (e.g., for eligibility determination) remains. We identified the temporal knowledge representation requirements of eligibility criteria by reviewing 100 temporal criteria. We developed EliXR-TIME, a frame-based representation designed to support semantic annotation for temporal expressions in eligibility criteria by reusing applicable classes from well-known clinical temporal knowledge representations. We used EliXR-TIME to analyze a training set of 50 new temporal eligibility criteria. We evaluated EliXR-TIME using an additional random sample of 20 eligibility criteria with temporal expressions that have no overlap with the training data, yielding 92.7% (76 / 82) inter-coder agreement on sentence chunking and 72% (72 / 100) agreement on semantic annotation. We conclude that this knowledge representation can facilitate semantic annotation of the temporal expressions in eligibility criteria

    Improving Syntactic Parsing of Clinical Text Using Domain Knowledge

    Get PDF
    Syntactic parsing is one of the fundamental tasks of Natural Language Processing (NLP). However, few studies have explored syntactic parsing in the medical domain. This dissertation systematically investigated different methods to improve the performance of syntactic parsing of clinical text, including (1) Constructing two clinical treebanks of discharge summaries and progress notes by developing annotation guidelines that handle missing elements in clinical sentences; (2) Retraining four state-of-the-art parsers, including the Stanford parser, Berkeley parser, Charniak parser, and Bikel parser, using clinical treebanks, and comparing their performance to identify better parsing approaches; and (3) Developing new methods to reduce syntactic ambiguity caused by Prepositional Phrase (PP) attachment and coordination using semantic information. Our evaluation showed that clinical treebanks greatly improved the performance of existing parsers. The Berkeley parser achieved the best F-1 score of 86.39% on the MiPACQ treebank. For PP attachment, our proposed methods improved the accuracies of PP attachment by 2.35% on the MiPACQ corpus and 1.77% on the I2b2 corpus. For coordination, our method achieved a precision of 94.9% and a precision of 90.3% for the MiPACQ and i2b2 corpus, respectively. To further demonstrate the effectiveness of the improved parsing approaches, we applied outputs of our parsers to two external NLP tasks: semantic role labeling and temporal relation extraction. The experimental results showed that performance of both tasks’ was improved by using the parse tree information from our optimized parsers, with an improvement of 3.26% in F-measure for semantic role labelling and an improvement of 1.5% in F-measure for temporal relation extraction

    Temporal disambiguation of relative temporal expressions in clinical texts using temporally fine-tuned contextual word embeddings.

    Get PDF
    Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Much of the current work in this field is focused on direct and explicit temporal expressions and identifying temporal relations. However, there is little work focused on relative temporal expressions, which can be difficult to normalize, but are vital to ordering events on a timeline. This work introduces a new temporal expression recognition and normalization tool, Chrono, that normalizes temporal expressions into both SCATE and TimeML schemes. Chrono advances clinical timeline extraction as it is capable of identifying more vague and relative temporal expressions than the current state-of-the-art and utilizes contextualized word embeddings from fine-tuned BERT models to disambiguate temporal types, which achieves state-of-the-art performance on relative temporal expressions. In addition, this work shows that fine-tuning BERT models on temporal tasks modifies the contextualized embeddings so that they achieve improved performance in classical SVM and CNN classifiers. Finally, this works provides a new tool for linking temporal expressions to events or other entities by introducing a novel method to identify which tokens an entire temporal expression is paying the most attention to by summarizing the attention weight matrices output by BERT models

    One, no one and one hundred thousand events: Defining and processing events in an inter-disciplinary perspective

    Get PDF
    We present an overview of event definition and processing spanning 25 years of research in NLP. We first provide linguistic background to the notion of event, and then present past attempts to formalize this concept in annotation standards to foster the development of benchmarks for event extraction systems. This ranges from MUC-3 in 1991 to the Time and Space Track challenge at SemEval 2015. Besides, we shed light on other disciplines in which the notion of event plays a crucial role, with a focus on the historical domain. Our goal is to provide a comprehensive study on event definitions and investigate which potential past efforts in the NLP community may have in a different research domain. We present the results of a questionnaire, where the notion of event for historians is put in relation to the NLP perspective
    • …
    corecore