20,200 research outputs found

    Classifying Relations using Recurrent Neural Network with Ontological-Concept Embedding

    Get PDF
    Relation extraction and classification represents a fundamental and challenging aspect of Natural Language Processing (NLP) research which depends on other tasks such as entity detection and word sense disambiguation. Traditional relation extraction methods based on pattern-matching using regular expressions grammars and lexico-syntactic pattern rules suffer from several drawbacks including the labor involved in handcrafting and maintaining large number of rules that are difficult to reuse. Current research has focused on using Neural Networks to help improve the accuracy of relation extraction tasks using a specific type of Recurrent Neural Network (RNN). A promising approach for relation classification uses an RNN that incorporates an ontology-based concept embedding layer in addition to word embeddings. This dissertation presents several improvements to this approach by addressing its main limitations. First, several different types of semantic relationships between concepts are incorporated into the model; prior work has only considered is-a hierarchical relationships. Secondly, a significantly larger vocabulary of concepts is used. Thirdly, an improved method for concept matching was devised. The results of adding these improvements to two state-of-the-art baseline models demonstrated an improvement to accuracy when evaluated on benchmark data used in prior studies

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    Get PDF
    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

    Knowledge Base Population using Semantic Label Propagation

    Get PDF
    A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge Bases for Natural Language Processin
    • …
    corecore