12,914 research outputs found
A Dependency-Based Neural Network for Relation Classification
Previous research on relation classification has verified the effectiveness
of using dependency shortest paths or subtrees. In this paper, we further
explore how to make full use of the combination of these dependency
information. We first propose a new structure, termed augmented dependency path
(ADP), which is composed of the shortest dependency path between two entities
and the subtrees attached to the shortest path. To exploit the semantic
representation behind the ADP structure, we develop dependency-based neural
networks (DepNN): a recursive neural network designed to model the subtrees,
and a convolutional neural network to capture the most important features on
the shortest path. Experiments on the SemEval-2010 dataset show that our
proposed method achieves state-of-art results.Comment: This preprint is the full version of a short paper accepted in the
annual meeting of the Association for Computational Linguistics (ACL) 2015
(Beijing, China
A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency
Definition Extraction (DE) is one of the well-known topics in Information
Extraction that aims to identify terms and their corresponding definitions in
unstructured texts. This task can be formalized either as a sentence
classification task (i.e., containing term-definition pairs or not) or a
sequential labeling task (i.e., identifying the boundaries of the terms and
definitions). The previous works for DE have only focused on one of the two
approaches, failing to model the inter-dependencies between the two tasks. In
this work, we propose a novel model for DE that simultaneously performs the two
tasks in a single framework to benefit from their inter-dependencies. Our model
features deep learning architectures to exploit the global structures of the
input sentences as well as the semantic consistencies between the terms and the
definitions, thereby improving the quality of the representation vectors for
DE. Besides the joint inference between sentence classification and sequential
labeling, the proposed model is fundamentally different from the prior work for
DE in that the prior work has only employed the local structures of the input
sentences (i.e., word-to-word relations), and not yet considered the semantic
consistencies between terms and definitions. In order to implement these novel
ideas, our model presents a multi-task learning framework that employs graph
convolutional neural networks and predicts the dependency paths between the
terms and the definitions. We also seek to enforce the consistency between the
representations of the terms and definitions both globally (i.e., increasing
semantic consistency between the representations of the entire sentences and
the terms/definitions) and locally (i.e., promoting the similarity between the
representations of the terms and the definitions)
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
Crowdsourcing Semantic Label Propagation in Relation Classification
Distant supervision is a popular method for performing relation extraction
from text that is known to produce noisy labels. Most progress in relation
extraction and classification has been made with crowdsourced corrections to
distant-supervised labels, and there is evidence that indicates still more
would be better. In this paper, we explore the problem of propagating human
annotation signals gathered for open-domain relation classification through the
CrowdTruth methodology for crowdsourcing, that captures ambiguity in
annotations by measuring inter-annotator disagreement. Our approach propagates
annotations to sentences that are similar in a low dimensional embedding space,
expanding the number of labels by two orders of magnitude. Our experiments show
significant improvement in a sentence-level multi-class relation classifier.Comment: In publication at the First Workshop on Fact Extraction and
Verification (FeVer) at EMNLP 201
- …