3,502 research outputs found

    Bipartite Flat-Graph Network for Nested Named Entity Recognition

    Full text link
    In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for nested named entity recognition (NER), which contains two subgraph modules: a flat NER module for outermost entities and a graph module for all the entities located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional network (GCN) are adopted to jointly learn flat entities and their inner dependencies. Different from previous models, which only consider the unidirectional delivery of information from innermost layers to outer ones (or outside-to-inside), our model effectively captures the bidirectional interaction between them. We first use the entities recognized by the flat NER module to construct an entity graph, which is fed to the next graph module. The richer representation learned from graph module carries the dependencies of inner entities and can be exploited to improve outermost entity predictions. Experimental results on three standard nested NER datasets demonstrate that our BiFlaG outperforms previous state-of-the-art models.Comment: Accepted by ACL202

    Biomedical Event Extraction with Machine Learning

    Get PDF
    Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.Siirretty Doriast

    A Hybrid Method of Coreference Resolution in Information Security

    Get PDF

    Semi-supervised method for biomedical event extraction

    Get PDF
    Introduction. In Colombia, malaria represents a serious public health problem. It is estimated that approximately 60% of the population is at risk of the disease.Objective. To describe the mortality trends for malaria in Colombia, from 1979 to 2008. Materials and methods. A descriptive study to determine the trends of the malaria mortality was carried out. The information sources used were databases of registered deaths and population projections from 1979 to 2008 of the National Statistics Department. The indicator used was the mortality rate. The trend was analyzed by join point regression.Results. Six thousands nine hundred and sixty five deaths caused by malaria were certified for an age-adjusted rate of 0.74 deaths/100.000 inhabitants for the study period. In 74.3% of the deaths, the parasite species was not mentioned. The trend in the mortality rate showed a statistically significant decreasing behavior, which was lower from the second half of the nineties as compared with that presented in the eighties.Conclusions. The magnitude of mortality by malaria in Colombia is not high, in spite of the evident underreporting. A marked downward trend was observed between 1979 and 2008. The information obtained from death certificates, along with that of the public health surveillance system will allow to modify the recommendations and improve the implementation of preventive and control measures to further reduce the mortality caused by malaria.Introducción. En Colombia, el paludismo representa un grave problema de salud pública. Se estima que, aproximadamente, 60 % de la población se encuentra en riesgo de enfermar o de morir por esta causa.Objetivo. Describir la tendencia de la mortalidad por paludismo en Colombia desde 1979 hasta 2008. Materiales y métodos. Se llevó a cabo un estudio descriptivo para determinar la tendencia de las tasas de mortalidad. Las fuentes de información fueron las bases de datos de las defunciones registradas y de las proyecciones de población de 1979 a 2008 del Departamento Nacional de Estadística (DANE). El indicador empleado fue la tasa de mortalidad. La tendencia se analizó mediante el software de análisis de regresión de puntos de inflexión (joinpoint).Resultados. Se certificaron 6.965 muertes por paludismo para una tasa ajustada por edad de 0,74 muertes por 100.000 habitantes para el periodo estudiado. En 74,3 % de las muertes, no se especificó la especie parasitaria. Las tasas de mortalidad por paludismo presentaron una tendencia decreciente estadísticamente significativa, que fue menor a partir de la segunda mitad de la década de los 90 en comparación con la presentada en la década de los 80.Conclusiones. La magnitud de la mortalidad por paludismo en Colombia no es grande, a pesar del evidente subregistro; se observó una tendencia descendente entre 1979 y 2008. La información derivada de los certificados de defunción, junto con la del sistema de vigilancia en salud pública, permitirá modificar las recomendaciones y mejorar la toma de medidas preventivas y de control pertinentes para continuar reduciendo la mortalidad causada por el paludismo
    corecore