20,200 research outputs found
Classifying Relations using Recurrent Neural Network with Ontological-Concept Embedding
Relation extraction and classification represents a fundamental and challenging aspect of Natural Language Processing (NLP) research which depends on other tasks such as entity detection and word sense disambiguation. Traditional relation extraction methods based on pattern-matching using regular expressions grammars and lexico-syntactic pattern rules suffer from several drawbacks including the labor involved in handcrafting and maintaining large number of rules that are difficult to reuse. Current research has focused on using Neural Networks to help improve the accuracy of relation extraction tasks using a specific type of Recurrent Neural Network (RNN). A promising approach for relation classification uses an RNN that incorporates an ontology-based concept embedding layer in addition to word embeddings. This dissertation presents several improvements to this approach by addressing its main limitations. First, several different types of semantic relationships between concepts are incorporated into the model; prior work has only considered is-a hierarchical relationships. Secondly, a significantly larger vocabulary of concepts is used. Thirdly, an improved method for concept matching was devised. The results of adding these improvements to two state-of-the-art baseline models demonstrated an improvement to accuracy when evaluated on benchmark data used in prior studies
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
- …