Deep learning for information extraction in the biomedical domain

Abstract

Mención Internacional en el título de doctorThe main hypothesis of this PhD dissertation is that novel Deep Learning algorithms can outperform classical Machine Learning methods for the task of Information Extraction in the Biomedical Domain. Contrary to classical systems, Deep Learning models can learn the representation of the data automatically without an expert domain knowledge and avoid the tedious and time-consuming task of defining relevant features. A Drug-Drug Interaction (DDI), which is an essential subset of Adverse Drug Reaction (ADR), represents the alterations in the effects of drugs that were taken simultaneously. The early recognition of interacting drugs is a vital process that prevents serious health problems that can cause death in the worst cases. Health-care professionals and researchers in this domain find the task of discovering information about these incidents very challenging due to the vast number of pharmacovigilance documents. For this reason, several shared tasks and datasets have been developed in order to solve this issue with automated annotation systems with the capability to extract this information. In the present document, the DDI corpus, which is an annotated dataset of DDIs, is used with Deep Learning architectures without any external information for the tasks of Name Entity Recognition and Relation Extraction in order to validate the hypothesis. Furthermore, some other datasets are tested to evidence the performance of these systems. To sum up, the results suggest that the most common Deep Learning methods like Convolutional Neural Networks and Recurrent Neural Networks overcome the traditional algorithms concluding that Deep Learning is a real alternative for a specific and complex scenario like the Information Extraction in the Biomedical domain. As a final goal, a complete architecture that covers the two tasks is developed to structure the named entities and their relationships from raw pharmacological texts.This thesis has been supported by: Pre-doctoral research training scholarship of the Carlos III University of Madrid (PIF UC3M 02-1415) for four years. Research Program of the Ministry of Economy and Competitiveness - Government of Spain, (DeepEMR project TIN2017-87548-C2-1-R). Research Program of the Ministry of Economy and Competitiveness - Government of Spain, (eGovernAbility-Access project TIN2014-52665-C2-2-R). Doctoral stay TEAM - Technologies for information and communication, Europe - east Asia Mobilities project (Erasmus Mundus Action 2-Strand 2 Programme) funded by the European Commission realized in the University of Tokyo, Japan, for the Aizawa Laboratory in National Institute of Informatics (NII) for seven months.PublicadoPrograma de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Ricardo Aler Mur.- Secretario: Alberto Díaz Esteban.- Vocal: María Herrero Zaz

    Similar works