12 research outputs found

    Recognition and normalization of temporal expressions in Serbian medical narratives

    Get PDF
    The temporal dimension emerges as one of the essential concepts in the field of medicine, providing a basis for the proper interpretation and understanding of medically relevant information, often recorded only in unstructured texts. Automatic processing of temporal expressions involves their identification and formalization in a language understandable to computers. This paper aims to apply the existing system for automatic processing of temporal expressions in Serbian natural language texts to medical narrative texts, to evaluate the system’s efficiency in recognition and normalization of temporal expressions and to determine the degree of necessary adaptation according to the characteristics and requirements of the medical domain

    Improving Syntactic Parsing of Clinical Text Using Domain Knowledge

    Get PDF
    Syntactic parsing is one of the fundamental tasks of Natural Language Processing (NLP). However, few studies have explored syntactic parsing in the medical domain. This dissertation systematically investigated different methods to improve the performance of syntactic parsing of clinical text, including (1) Constructing two clinical treebanks of discharge summaries and progress notes by developing annotation guidelines that handle missing elements in clinical sentences; (2) Retraining four state-of-the-art parsers, including the Stanford parser, Berkeley parser, Charniak parser, and Bikel parser, using clinical treebanks, and comparing their performance to identify better parsing approaches; and (3) Developing new methods to reduce syntactic ambiguity caused by Prepositional Phrase (PP) attachment and coordination using semantic information. Our evaluation showed that clinical treebanks greatly improved the performance of existing parsers. The Berkeley parser achieved the best F-1 score of 86.39% on the MiPACQ treebank. For PP attachment, our proposed methods improved the accuracies of PP attachment by 2.35% on the MiPACQ corpus and 1.77% on the I2b2 corpus. For coordination, our method achieved a precision of 94.9% and a precision of 90.3% for the MiPACQ and i2b2 corpus, respectively. To further demonstrate the effectiveness of the improved parsing approaches, we applied outputs of our parsers to two external NLP tasks: semantic role labeling and temporal relation extraction. The experimental results showed that performance of both tasks’ was improved by using the parse tree information from our optimized parsers, with an improvement of 3.26% in F-measure for semantic role labelling and an improvement of 1.5% in F-measure for temporal relation extraction

    Temporal disambiguation of relative temporal expressions in clinical texts using temporally fine-tuned contextual word embeddings.

    Get PDF
    Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Much of the current work in this field is focused on direct and explicit temporal expressions and identifying temporal relations. However, there is little work focused on relative temporal expressions, which can be difficult to normalize, but are vital to ordering events on a timeline. This work introduces a new temporal expression recognition and normalization tool, Chrono, that normalizes temporal expressions into both SCATE and TimeML schemes. Chrono advances clinical timeline extraction as it is capable of identifying more vague and relative temporal expressions than the current state-of-the-art and utilizes contextualized word embeddings from fine-tuned BERT models to disambiguate temporal types, which achieves state-of-the-art performance on relative temporal expressions. In addition, this work shows that fine-tuning BERT models on temporal tasks modifies the contextualized embeddings so that they achieve improved performance in classical SVM and CNN classifiers. Finally, this works provides a new tool for linking temporal expressions to events or other entities by introducing a novel method to identify which tokens an entire temporal expression is paying the most attention to by summarizing the attention weight matrices output by BERT models

    Automatic recognition and normalization of temporal expressions in Serbian unstructured newspaper and medical texts

    Get PDF
    Ljudi u svakodnevnom životu koriste vreme kao univerzalni referentni sistem, u okviru koga se doga¯daji ili stanja nižu jedan za drugim, utvr¯duje dužina njihovog trajanja i navodi kada se neki doga¯daj desio. Znaˇcenje vremena i naˇcin na koji ˇcovek poima vreme ogledaju se i u komunikaciji, pre svega, u jeziˇckim izrazima koji se uˇcestalo koriste u svakodnevnom govoru. Vremenski izrazi, kao fraze prirodnog jezika koje na direktan naˇcin ukazuju na vreme, pružaju informaciju o tome kada se nešto dogodilo, koliko dugo je trajalo ili koliko ˇcesto se dešava. Uporedo s razvojem informatiˇckog društva, pove´cava se i koliˇcina slobodno dostupnih digitalnih informacija, što daje ve´ce mogu´cnosti pronalaženja potrebnih informacija, ali i utiˇce na složenost ovog procesa, iziskuju´ci koriš´cenje naprednih raˇcunarskih alata i mo´cnijih metoda automatske obrade tekstova prirodnih jezika. S obzirom na to da se znaˇcenje ve´cine elektronskih informacija menja u zavisnosti od vremena iskazanog u njima, radi uspešnog razumevanja tekstova pisanih prirodnim jezikom, neophodno je koriš´cenje alata koji su sposobni da automatski oznaˇce i informacije koje referišu na vreme i omogu´ce uspostavljanje hronološkog sleda opisanih doga¯daja. Stoga je potrebno razviti alate namenjene ekstrakciji vremenskih izraza, kod kojih su preciznost i odziv na visokom nivou i koji se brzo i jednostavno mogu prilagoditi novim zahtevima ili tekstovima drugog domena. Postojanje ovakvog sistema može u velikoj meri uticati na poboljšanje uˇcinka primene mnogih drugih aplikacija iz oblasti jeziˇckih tehnologija (ekstrakcija informacija, pronalaženje informacija, odgovaranje na pitanja, rezimiranje teksta itd.), ali i doprineti oˇcuvanju srpskog jezika u savremenom digitalnom okruženju...People in everyday life use time as a universal reference system, within which, events or states are sequenced one after the other, it is established how long they lasted and it is stated when an event occurred. The meaning of time and the way humans perceive time is reflected in communication, most of all, in linguistic expressions frequently used in everyday speech. Temporal expressions, as natural language phrases which directly refer to time, provide information on when something happened, how long it lasted and how often it occurs. Alongside with the information society development, the amount of freely available digital information has increased, which provides a greater possibility of finding the necessary information, but also affects the complexity of this process, by requiring the use of advanced computer tools and more powerful natural language text processing methods. Having in mind that the meaning of most electronic information can change depending on time expressed in them, it is essential to use tools which can both automatically mark the information related to time and enable the establishment of chronological order of described events. Therefore, it is necessary to develop tools for extraction of temporal expressions with high levels of precision and recall, which can be easily and quickly adapted to new demands and texts from different domains. The existence of such a system can, to a great extent, affect the effectiveness improvement in implementation of many other applications from the field of language technology (information extraction, information retrieval, question answering, text summarization, etc.), but also contribute to the preservation of the Serbian language in the contemporary digital environment..

    DEVELOPING A CLINICAL LINGUISTIC FRAMEWORK FOR PROBLEM LIST GENERATION FROM CLINICAL TEXT

    Get PDF
    Regulatory institutions such as the Institute of Medicine and Joint Commission endorse problem lists as an effective method to facilitate transitions of care for patients. In practice, the problem list is a common model for documenting a care provider's medical reasoning with respect to a problem and its status during patient care. Although natural language processing (NLP) systems have been developed to support problem list generation, encoding many information layers - morphological, syntactic, semantic, discourse, and pragmatic - can prove computationally expensive. The contribution of each information layer for accurate problem list generation has not been formally assessed. We would expect a problem list generator that relies on natural language processing would improve its performance with the addition of rich semantic features We hypothesize that problem list generation can be approached as a two-step classification problem - problem mention status (Aim One) and patient problem status (Aim Two) classification. In Aim One, we will automatically classify the status of each problem mention using semantic features about problems described in the clinical narrative. In Aim Two, we will classify active patient problems from individual problem mentions and their statuses. We believe our proposal is significant in two ways. First, our experiments will develop and evaluate semantic features, some commonly modeled and others not in the clinical text. The annotations we use will be made openly available to other NLP researchers to encourage future research on this task and other related problems including foundational NLP algorithms (assertion classification and coreference resolution) and applied clinical applications (patient timeline and record visualization). Second, by generating and evaluating existing NLP systems, we are building an open-source problem list generator and demonstrating the performance for problem list generation using these features

    Preface

    Get PDF

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Nonlinear Parametric and Neural Network Modelling for Medical Image Classification

    Get PDF
    System identification and artificial neural networks (ANN) are families of algorithms used in systems engineering and machine learning respectively that use structure detection and learning strategies to build models of complex systems by taking advantage of input-output type data. These models play an essential role in science and engineering because they fill the gap in those cases where we know the input-output behaviour of a system, but there is not a mathematical model to understand and predict its changes in future or even prevent threats. In this context, the nonlinear approximation of systems is nowadays very popular since it better describes complex instances. On the other hand, digital image processing is an area of systems engineering that is expanding the analysis dimension level in a variety of real-life problems while it is becoming more attractive and affordable over time. Medicine has made the most of it by supporting important human decision-making processes through computer-aided diagnosis (CAD) systems. This thesis presents three different frameworks for breast cancer detection, with approaches ranging from nonlinear system identification, nonlinear system identification coupled with simple neural networks, to multilayer neural networks. In particular, the nonlinear system identification approaches termed the Nonlinear AutoRegressive with eXogenous inputs (NARX) model and the MultiScales Radial Basis Function (MSRBF) neural networks appear for the first time in image processing. Along with the above contributions takes place the presentation of the Multilayer-Fuzzy Extreme Learning Machine (ML-FELM) neural network for faster training and more accurate image classification. A central research aim is to take advantage of nonlinear system identification and multilayer neural networks to enhance the feature extraction process, while the classification in CAD systems is bolstered. In the case of multilayer neural networks, the extraction is carried throughout stacked autoencoders, a bottleneck network architecture that promotes a data transformation between layers. In the case of nonlinear system identification, the goal is to add flexible models capable of capturing distinctive features from digital images that might be shortly recognised by simpler approaches. The purpose of detecting nonlinearities in digital images is complementary to that of linear models since the goal is to extract features in greater depth, in which both linear and nonlinear elements can be captured. This aim is relevant because, accordingly to previous work cited in the first chapter, not all spatial relationships existing in digital images can be explained appropriately with linear dependencies. Experimental results show that the methodologies based on system identification produced reliable images models with customised mathematical structure. The models came to include nonlinearities in different proportions, depending upon the case under examination. The information about nonlinearity and model structure was used as part of the whole image model. It was found that, in some instances, the models from different clinical classes in the breast cancer detection problem presented a particular structure. For example, NARX models of the malignant class showed higher non-linearity percentage and depended more on exogenous inputs compared to other classes. Regarding classification performance, comparisons of the three new CAD systems with existing methods had variable results. As for the NARX model, its performance was superior in three cases but was overcame in two. However, the comparison must be taken with caution since different databases were used. The MSRBF model was better in 5 out of 6 cases and had superior specificity in all instances, overcoming in 3.5% the closest model in this line. The ML-FELM model was the best in 6 out of 6 cases, although it was defeated in accuracy by 0.6% in one case and specificity in 0.22% in another one
    corecore