12 research outputs found
Recommended from our members
A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives
In this thesis, a synthesis is presented of the knowledge models required by clinical informa- tion systems that provide decision support for longitudinal processes of care. Qualitative research techniques and thematic analysis are novelly applied to a systematic review of the literature on the challenges in implementing such systems, leading to the development of an original conceptual framework. The thesis demonstrates how these process-oriented systems make use of a knowledge base derived from workflow models and clinical guidelines, and argues that one of the major barriers to implementation is the need to extract explicit and implicit information from diverse resources in order to construct the knowledge base. Moreover, concepts in both the knowledge base and in the electronic health record (EHR) must be mapped to a common ontological model. However, the majority of clinical guideline information remains in text form, and much of the useful clinical information residing in the EHR resides in the free text fields of progress notes and laboratory reports. In this thesis, it is shown how natural language processing and information extraction techniques provide a means to identify and formalise the knowledge components required by the knowledge base. Original contributions are made in the development of lexico-syntactic patterns and the use of external domain knowledge resources to tackle a variety of information extraction tasks in the clinical domain, such as recognition of clinical concepts, events, temporal relations, term disambiguation and abbreviation expansion. Methods are developed for adapting existing tools and resources in the biomedical domain to the processing of clinical texts, and approaches to improving the scalability of these tools are proposed and evalu- ated. These tools and techniques are then combined in the creation of a novel approach to identifying processes of care in the clinical narrative. It is demonstrated that resolution of coreferential and anaphoric relations as narratively and temporally ordered chains provides a means to extract linked narrative events and processes of care from clinical notes. Coreference performance in discharge summaries and progress notes is largely dependent on correct identification of protagonist chains (patient, clinician, family relation), pronominal resolution, and string matching that takes account of experiencer, temporal, spatial, and anatomical context; whereas for laboratory reports additional, external domain knowledge is required. The types of external knowledge and their effects on system performance are identified and evaluated. Results are compared against existing systems for solving these tasks and are found to improve on them, or to approach the performance of recently reported, state-of-the- art systems. Software artefacts developed in this research have been made available as open-source components within the General Architecture for Text Engineering framework
Recognition and normalization of temporal expressions in Serbian medical narratives
The temporal dimension emerges as one of the essential concepts in the field of medicine, providing a basis for the proper interpretation and understanding of medically relevant information, often recorded only in unstructured texts. Automatic processing of temporal expressions involves their identification and formalization in a language understandable to computers. This paper aims to apply the existing system for automatic processing of temporal expressions in Serbian natural language texts to medical narrative texts, to evaluate the system’s efficiency in recognition and normalization of temporal expressions and to determine the degree of necessary adaptation according to the characteristics and requirements of the medical domain
Improving Syntactic Parsing of Clinical Text Using Domain Knowledge
Syntactic parsing is one of the fundamental tasks of Natural Language Processing (NLP). However, few studies have explored syntactic parsing in the medical domain. This dissertation systematically investigated different methods to improve the performance of syntactic parsing of clinical text, including (1) Constructing two clinical treebanks of discharge summaries and progress notes by developing annotation guidelines that handle missing elements in clinical sentences; (2) Retraining four state-of-the-art parsers, including the Stanford parser, Berkeley parser, Charniak parser, and Bikel parser, using clinical treebanks, and comparing their performance to identify better parsing approaches; and (3) Developing new methods to reduce syntactic ambiguity caused by Prepositional Phrase (PP) attachment and coordination using semantic information.
Our evaluation showed that clinical treebanks greatly improved the performance of existing parsers. The Berkeley parser achieved the best F-1 score of 86.39% on the MiPACQ treebank. For PP attachment, our proposed methods improved the accuracies of PP attachment by 2.35% on the MiPACQ corpus and 1.77% on the I2b2 corpus. For coordination, our method achieved a precision of 94.9% and a precision of 90.3% for the MiPACQ and i2b2 corpus, respectively. To further demonstrate the effectiveness of the improved parsing approaches, we applied outputs of our parsers to two external NLP tasks: semantic role labeling and temporal relation extraction. The experimental results showed that performance of both tasks’ was improved by using the parse tree information from our optimized parsers, with an improvement of 3.26% in F-measure for semantic role labelling and an improvement of 1.5% in F-measure for temporal relation extraction
Temporal disambiguation of relative temporal expressions in clinical texts using temporally fine-tuned contextual word embeddings.
Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Much of the current work in this field is focused on direct and explicit temporal expressions and identifying temporal relations. However, there is little work focused on relative temporal expressions, which can be difficult to normalize, but are vital to ordering events on a timeline. This work introduces a new temporal expression recognition and normalization tool, Chrono, that normalizes temporal expressions into both SCATE and TimeML schemes. Chrono advances clinical timeline extraction as it is capable of identifying more vague and relative temporal expressions than the current state-of-the-art and utilizes contextualized word embeddings from fine-tuned BERT models to disambiguate temporal types, which achieves state-of-the-art performance on relative temporal expressions. In addition, this work shows that fine-tuning BERT models on temporal tasks modifies the contextualized embeddings so that they achieve improved performance in classical SVM and CNN classifiers. Finally, this works provides a new tool for linking temporal expressions to events or other entities by introducing a novel method to identify which tokens an entire temporal expression is paying the most attention to by summarizing the attention weight matrices output by BERT models
Automatic recognition and normalization of temporal expressions in Serbian unstructured newspaper and medical texts
Ljudi u svakodnevnom životu koriste vreme kao univerzalni referentni sistem,
u okviru koga se doga¯daji ili stanja nižu jedan za drugim, utvr¯duje dužina njihovog
trajanja i navodi kada se neki doga¯daj desio. Znaˇcenje vremena i naˇcin
na koji ˇcovek poima vreme ogledaju se i u komunikaciji, pre svega, u jeziˇckim
izrazima koji se uˇcestalo koriste u svakodnevnom govoru. Vremenski izrazi, kao
fraze prirodnog jezika koje na direktan naˇcin ukazuju na vreme, pružaju informaciju
o tome kada se nešto dogodilo, koliko dugo je trajalo ili koliko ˇcesto se dešava.
Uporedo s razvojem informatiˇckog društva, pove´cava se i koliˇcina slobodno dostupnih
digitalnih informacija, što daje ve´ce mogu´cnosti pronalaženja potrebnih
informacija, ali i utiˇce na složenost ovog procesa, iziskuju´ci koriš´cenje naprednih
raˇcunarskih alata i mo´cnijih metoda automatske obrade tekstova prirodnih jezika.
S obzirom na to da se znaˇcenje ve´cine elektronskih informacija menja u zavisnosti
od vremena iskazanog u njima, radi uspešnog razumevanja tekstova pisanih
prirodnim jezikom, neophodno je koriš´cenje alata koji su sposobni da automatski
oznaˇce i informacije koje referišu na vreme i omogu´ce uspostavljanje hronološkog
sleda opisanih doga¯daja. Stoga je potrebno razviti alate namenjene ekstrakciji
vremenskih izraza, kod kojih su preciznost i odziv na visokom nivou i koji se brzo
i jednostavno mogu prilagoditi novim zahtevima ili tekstovima drugog domena.
Postojanje ovakvog sistema može u velikoj meri uticati na poboljšanje uˇcinka primene
mnogih drugih aplikacija iz oblasti jeziˇckih tehnologija (ekstrakcija informacija,
pronalaženje informacija, odgovaranje na pitanja, rezimiranje teksta itd.), ali
i doprineti oˇcuvanju srpskog jezika u savremenom digitalnom okruženju...People in everyday life use time as a universal reference system, within which,
events or states are sequenced one after the other, it is established how long they
lasted and it is stated when an event occurred. The meaning of time and the
way humans perceive time is reflected in communication, most of all, in linguistic
expressions frequently used in everyday speech. Temporal expressions, as natural
language phrases which directly refer to time, provide information on when something
happened, how long it lasted and how often it occurs.
Alongside with the information society development, the amount of freely available
digital information has increased, which provides a greater possibility of
finding the necessary information, but also affects the complexity of this process,
by requiring the use of advanced computer tools and more powerful natural language
text processing methods. Having in mind that the meaning of most electronic
information can change depending on time expressed in them, it is essential to
use tools which can both automatically mark the information related to time and
enable the establishment of chronological order of described events. Therefore,
it is necessary to develop tools for extraction of temporal expressions with high
levels of precision and recall, which can be easily and quickly adapted to new demands
and texts from different domains. The existence of such a system can, to
a great extent, affect the effectiveness improvement in implementation of many
other applications from the field of language technology (information extraction,
information retrieval, question answering, text summarization, etc.), but also contribute
to the preservation of the Serbian language in the contemporary digital
environment..
DEVELOPING A CLINICAL LINGUISTIC FRAMEWORK FOR PROBLEM LIST GENERATION FROM CLINICAL TEXT
Regulatory institutions such as the Institute of Medicine and Joint Commission endorse problem lists as an effective method to facilitate transitions of care for patients. In practice, the problem list is a common model for documenting a care provider's medical reasoning with respect to a problem and its status during patient care. Although natural language processing (NLP) systems have been developed to support problem list generation, encoding many information layers - morphological, syntactic, semantic, discourse, and pragmatic - can
prove computationally expensive. The contribution of each information layer for accurate problem list generation has not been formally assessed. We would expect a problem list generator that relies on natural language processing would improve its performance with the addition of rich semantic features
We hypothesize that problem list generation can be approached as a two-step classification problem - problem mention status (Aim One) and patient problem status (Aim Two) classification. In Aim One, we will automatically classify the status of each problem mention using semantic features about problems described in the clinical narrative. In Aim Two, we will classify active patient problems from individual problem mentions and their statuses.
We believe our proposal is significant in two ways. First, our experiments will develop and evaluate semantic features, some commonly modeled and others not in the clinical text. The annotations we use will be made openly available to other NLP researchers to encourage future research on this task and other related problems including foundational NLP algorithms (assertion classification and coreference resolution) and applied clinical applications (patient timeline and record visualization). Second, by generating and evaluating existing
NLP systems, we are building an open-source problem list generator and demonstrating the performance for problem list generation using these features
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Nonlinear Parametric and Neural Network Modelling for Medical Image Classification
System identification and artificial neural networks (ANN) are families of algorithms used in systems engineering and machine learning respectively that use structure detection and learning strategies to build models of complex systems by taking advantage of input-output type data. These models play an essential role in science and engineering because they fill the gap in those cases where we know the input-output behaviour of a system, but there is not a mathematical model to understand and predict its changes in future or even prevent threats. In this context, the nonlinear approximation of systems is nowadays very popular since it better describes complex instances. On the other hand, digital image processing is an area of systems engineering that is expanding the analysis dimension level in a variety of real-life problems while it is becoming more attractive and affordable over time. Medicine has made the most of it by supporting important human decision-making processes through computer-aided diagnosis (CAD) systems.
This thesis presents three different frameworks for breast cancer detection, with approaches ranging from nonlinear system identification, nonlinear system identification coupled with simple neural networks, to multilayer neural networks. In particular, the nonlinear system identification approaches termed the Nonlinear AutoRegressive with eXogenous inputs (NARX) model and the MultiScales Radial Basis Function (MSRBF) neural networks appear for the first time in image processing. Along with the above contributions takes place the presentation of the Multilayer-Fuzzy Extreme Learning Machine (ML-FELM) neural network for faster training and more accurate image classification.
A central research aim is to take advantage of nonlinear system identification and multilayer neural networks to enhance the feature extraction process, while the classification in CAD systems is bolstered. In the case of multilayer neural networks, the extraction is carried throughout stacked autoencoders, a bottleneck network architecture that promotes a data transformation between layers. In the case of nonlinear system identification, the goal is to add flexible models capable of capturing distinctive features from digital images that might be shortly recognised by simpler approaches. The purpose of detecting nonlinearities in digital images is complementary to that of linear models since the goal is to extract features in greater depth, in which both linear and nonlinear elements can be captured. This aim is relevant because, accordingly to previous work cited in the first chapter, not all spatial relationships existing in digital images can be explained appropriately with linear dependencies.
Experimental results show that the methodologies based on system identification produced reliable images models with customised mathematical structure. The models came to include nonlinearities in different proportions, depending upon the case under examination. The information about nonlinearity and model structure was used as part of the whole image model. It was found that, in some instances, the models from different clinical classes in the breast cancer detection problem presented a particular structure. For example, NARX models of the malignant class showed higher non-linearity percentage and depended more on exogenous inputs compared to other classes.
Regarding classification performance, comparisons of the three new CAD systems with existing methods had variable results. As for the NARX model, its performance was superior in three cases but was overcame in two. However, the comparison must be taken with caution since different databases were used. The MSRBF model was better in 5 out of 6 cases and had superior specificity in all instances, overcoming in 3.5% the closest model in this line. The ML-FELM model was the best in 6 out of 6 cases, although it was defeated in accuracy by 0.6% in one case and specificity in 0.22% in another one