3,502 research outputs found
Bipartite Flat-Graph Network for Nested Named Entity Recognition
In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for
nested named entity recognition (NER), which contains two subgraph modules: a
flat NER module for outermost entities and a graph module for all the entities
located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional
network (GCN) are adopted to jointly learn flat entities and their inner
dependencies. Different from previous models, which only consider the
unidirectional delivery of information from innermost layers to outer ones (or
outside-to-inside), our model effectively captures the bidirectional
interaction between them. We first use the entities recognized by the flat NER
module to construct an entity graph, which is fed to the next graph module. The
richer representation learned from graph module carries the dependencies of
inner entities and can be exploited to improve outermost entity predictions.
Experimental results on three standard nested NER datasets demonstrate that our
BiFlaG outperforms previous state-of-the-art models.Comment: Accepted by ACL202
Recommended from our members
HOLMES: A Hybrid Ontology-Learning Materials Engineering System
Designing and discovering novel materials is challenging problem in many domains such as fuel additives, composites, pharmaceuticals, and so on. At the core of all this are models that capture how the different domain-specific data, information, and knowledge regarding the structures and properties of the materials are related to one another. This dissertation explores the difficult task of developing an artificial intelligence-based knowledge modeling environment, called Hybrid Ontology-Learning Materials Engineering System (HOLMES) that can assist humans in populating a materials science and engineering ontology through automatic information extraction from journal article abstracts. While what we propose may be adapted for a generic materials engineering application, our focus in this thesis is on the needs of the pharmaceutical industry. We develop the Columbia Ontology for Pharmaceutical Engineering (COPE), which is a modification of the Purdue Ontology for Pharmaceutical Engineering. COPE serves as the basis for HOLMES.
The HOLMES framework starts with journal articles that are in the Portable Document Format (PDF) and ends with the assignment of the entries in the journal articles into ontologies. While this might seem to be a simple task of information extraction, to fully extract the information such that the ontology is filled as completely and correctly as possible is not easy when considering a fully developed ontology.
In the development of the information extraction tasks, we note that there are new problems that have not arisen in previous information extraction work in the literature. The first is the necessity to extract auxiliary information in the form of concepts such as actions, ideas, problem specifications, properties, etc. The second problem is in the existence of multiple labels for a single token due to the existence of the aforementioned concepts. These two problems are the focus of this dissertation.
In this work, the HOLMES framework is presented as a whole, describing our successful progress as well as unsolved problems, which might help future research on this topic. The ontology is then presented to help in the identification of the relevant information that needs to be retrieved. The annotations are next developed to create the data sets necessary for the machine learning algorithms to perform. Then, the current level of information extraction for these concepts is explored and expanded. This is done through the introduction of entity feature sets that are based on previously extracted entities from the entity recognition task. And finally, the new task of handling multiple labels for tagging a single entity is also explored by the use of multiple-label algorithms used primarily in image processing
Biomedical Event Extraction with Machine Learning
Biomedical natural language processing (BioNLP) is a subfield of natural
language processing, an area of computational linguistics concerned with
developing programs that work with natural language: written texts and
speech. Biomedical relation extraction concerns the detection of semantic
relations such as protein-protein interactions (PPI) from scientific texts.
The aim is to enhance information retrieval by detecting relations between
concepts, not just individual concepts as with a keyword search.
In recent years, events have been proposed as a more detailed alternative
for simple pairwise PPI relations. Events provide a systematic, structural
representation for annotating the content of natural language texts. Events
are characterized by annotated trigger words, directed and typed arguments
and the ability to nest other events. For example, the sentence “Protein A
causes protein B to bind protein C” can be annotated with the nested event
structure CAUSE(A, BIND(B, C)). Converted to such formal representations,
the information of natural language texts can be used by computational
applications. Biomedical event annotations were introduced by the
BioInfer and GENIA corpora, and event extraction was popularized by the
BioNLP'09 Shared Task on Event Extraction.
In this thesis we present a method for automated event extraction, implemented
as the Turku Event Extraction System (TEES). A unified graph
format is defined for representing event annotations and the problem of
extracting complex event structures is decomposed into a number of independent
classification tasks. These classification tasks are solved using SVM
and RLS classifiers, utilizing rich feature representations built from full dependency
parsing. Building on earlier work on pairwise relation extraction
and using a generalized graph representation, the resulting TEES system is
capable of detecting binary relations as well as complex event structures.
We show that this event extraction system has good performance, reaching
the first place in the BioNLP'09 Shared Task on Event Extraction.
Subsequently, TEES has achieved several first ranks in the BioNLP'11 and
BioNLP'13 Shared Tasks, as well as shown competitive performance in the
binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared
tasks.
The Turku Event Extraction System is published as a freely available
open-source project, documenting the research in detail as well as making
the method available for practical applications. In particular, in this thesis
we describe the application of the event extraction method to PubMed-scale
text mining, showing how the developed approach not only shows good
performance, but is generalizable and applicable to large-scale real-world
text mining projects.
Finally, we discuss related literature, summarize the contributions of the
work and present some thoughts on future directions for biomedical event
extraction. This thesis includes and builds on six original research publications.
The first of these introduces the analysis of dependency parses that
leads to development of TEES. The entries in the three BioNLP Shared
Tasks, as well as in the DDIExtraction 2011 task are covered in four publications,
and the sixth one demonstrates the application of the system to
PubMed-scale text mining.Siirretty Doriast
Semi-supervised method for biomedical event extraction
Introduction. In Colombia, malaria represents a serious public health problem. It is estimated that approximately 60% of the population is at risk of the disease.Objective. To describe the mortality trends for malaria in Colombia, from 1979 to 2008. Materials and methods. A descriptive study to determine the trends of the malaria mortality was carried out. The information sources used were databases of registered deaths and population projections from 1979 to 2008 of the National Statistics Department. The indicator used was the mortality rate. The trend was analyzed by join point regression.Results. Six thousands nine hundred and sixty five deaths caused by malaria were certified for an age-adjusted rate of 0.74 deaths/100.000 inhabitants for the study period. In 74.3% of the deaths, the parasite species was not mentioned. The trend in the mortality rate showed a statistically significant decreasing behavior, which was lower from the second half of the nineties as compared with that presented in the eighties.Conclusions. The magnitude of mortality by malaria in Colombia is not high, in spite of the evident underreporting. A marked downward trend was observed between 1979 and 2008. The information obtained from death certificates, along with that of the public health surveillance system will allow to modify the recommendations and improve the implementation of preventive and control measures to further reduce the mortality caused by malaria.Introducción. En Colombia, el paludismo representa un grave problema de salud pública. Se estima que, aproximadamente, 60 % de la población se encuentra en riesgo de enfermar o de morir por esta causa.Objetivo. Describir la tendencia de la mortalidad por paludismo en Colombia desde 1979 hasta 2008. Materiales y métodos. Se llevó a cabo un estudio descriptivo para determinar la tendencia de las tasas de mortalidad. Las fuentes de información fueron las bases de datos de las defunciones registradas y de las proyecciones de población de 1979 a 2008 del Departamento Nacional de Estadística (DANE). El indicador empleado fue la tasa de mortalidad. La tendencia se analizó mediante el software de análisis de regresión de puntos de inflexión (joinpoint).Resultados. Se certificaron 6.965 muertes por paludismo para una tasa ajustada por edad de 0,74 muertes por 100.000 habitantes para el periodo estudiado. En 74,3 % de las muertes, no se especificó la especie parasitaria. Las tasas de mortalidad por paludismo presentaron una tendencia decreciente estadísticamente significativa, que fue menor a partir de la segunda mitad de la década de los 90 en comparación con la presentada en la década de los 80.Conclusiones. La magnitud de la mortalidad por paludismo en Colombia no es grande, a pesar del evidente subregistro; se observó una tendencia descendente entre 1979 y 2008. La información derivada de los certificados de defunción, junto con la del sistema de vigilancia en salud pública, permitirá modificar las recomendaciones y mejorar la toma de medidas preventivas y de control pertinentes para continuar reduciendo la mortalidad causada por el paludismo
- …