Search CORE

3 research outputs found

Information Extraction from Medical Reports of Patients with Multiple Myeloma using Machine Learning

Author: Schlenker Jan
Publication venue
Publication date
Field of study

Der Großteil von österreichischen Arztbriefen liegt immer noch in Form von freien narrativen Texten vor, wodurch die automatisierte Extrahierung wichtiger medizinischer Informationen erschwert wird. Um diesem Problem zu begegnen wurden in den letzten Jahren immer häufiger Methoden des machinellen Lernens erfolgreich angewandt. Die vorliegende Arbeit evaluiert zwei Algorithmen des machinellen Lernens - Support Vector Machines (SVMs) und Conditional Random Fields (CRFs) - für österreichische Arztbriefe von Patienten mit multiplem Myelom, um darin die Schwere der Krankheit, die Art des Myeloms und die zytogenetischen Anomalien zu extrahieren. Die benötigten Trainingsdaten werden mit Hilfe eines speziellen Token-Tagging-Programms erstellt, dessen Entwicklung ebenfalls Teil dieser Arbeit ist. Ergebnisse aus detaillierten Evaluationen zeigen, dass CRFs SVMs mit F1-Bestwerten von 0.928 gegenüber 0.765 übertreffen. Die hohen Werte legen nahe, dass Kliniken oder Klinkpartner in Österreich diese oder ähnliche Methoden für die Extrahierung weiterer medizinischer Informationen und Studien verwenden.The majority of Austrian medical reports is still in the form of free narrative text, which makes the automatized extraction of important medical information difficult. In recent years Machine learning (ML) approaches have been successfully applied to tackle the problem. In this thesis two ML algorithms are evaluated for Austrian reports from patients with multiple myeloma, namely Support Vector Machines (SVMs) and Conditional Random Fields (CRFs), to extract the severity of the disease, the type of the myeloma and cytogenetic anomalies. The required training data is created with the help of a custom token tagging programme, whose development is also part of this work. Results show that CRFs generally outperform SVMs, with top F1 scores 0.928 versus 0.765. These high scores indicate that clinics or clinical partners in Austria may utilize ML approaches for the extraction of further medical information and studies.Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüftInnsbruck, Univ., Masterarb., 2019(VLID)449390

University of Innsbruck Digital Library

Information Extraction from Medical Reports in the Italian Language for Clinical Timelines Reconstruction

Author: Viani Natalia
Publication venue: Università degli studi di Pavia
Publication date
Field of study

Electronic health records represent a great source of valuable information for both patient care and biomedical research. Despite the efforts put into collecting structured data, a lot of information is available only in the form of free text. For this reason, developing systems that automatically extract relevant information from clinical narratives is essential. In addition, summarizing all the data related to one single patient represents an essential task. In the field of clinical information extraction, several systems have been developed, especially for the analysis of texts written in English. However, the related research for non-English languages is still limited. In this research activity, information extraction techniques and summarization methods were applied to the analysis of medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily available. In this work, a corpus of molecular cardiology reports was considered as the main dataset for methods development. Moreover, to enable the design and the evaluation of different approaches, a subset of this corpus was annotated by manually identifying the information to be extracted from the texts. To access the knowledge included in textual medical reports, a first step involves the identification of clinical events. In the natural language processing community, this task is often addressed by using supervised methods. In this research activity, two different approaches were exploited to perform event extraction. First, a simple, yet effective approach based on dictionary lookup was used. Second, an application of recurrent neural networks was investigated. In clinical texts, events are often mentioned together with relevant attributes that have to be extracted to characterize the event itself. In this thesis, an ontology-driven approach was used to identify events’ attributes in the cardiology reports. In particular, a domain-specific ontology was manually developed, including all the relevant events with their associated attributes. As the gold standard for the evaluation phase, a hospital database, which stores most of the information written in the reports, was exploited. As another important task, to correctly reconstruct patients’ clinical histories, it is necessary to assign a specific time to each event extracted from the text. To this end, the identification of temporal expressions is a first, mandatory step. In this research activity, two existing rule-based systems for temporal information extraction were adapted to the analysis of clinical narratives. To process each document, the three illustrated steps (event, attribute, and time expression extraction) were aggregated into a pipeline. As an important remark, for each event and temporal expression identified in the text, the pipeline extracts a few properties of interest, too. Among these properties, the temporal relation between each event and the document creation time is computed (DocTimeRel). On the basis of this relation, each event is further linked to a reference time by applying a set of hand-crafted rules. Besides processing single medical reports, the system developed in this research activity is able to summarize multiple documents referred to the same patient. In this case, the information extraction pipeline is initially run on all the documents belonging to that patient. Then, the system builds and visualizes a timeline of all the extracted events, exploiting the DocTimeRel information and the event-time links. As regards the system’s evaluation, the overall information extraction pipeline performed well on the considered Italian cardiology corpus. In addition, the possibility to adapt the attribute extraction step to the analysis of another language was assessed, with promising results. In a similar way, the developed ontology was adapted to the analysis of another clinical domain, leading to a well-performing system

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia