53,258 research outputs found
Semi-supervised Relation Extraction via Data Augmentation and Consistency-training
Due to the semantic complexity of the Relation extraction (RE) task,
obtaining high-quality human labelled data is an expensive and noisy process.
To improve the sample efficiency of the models, semi-supervised learning (SSL)
methods aim to leverage unlabelled data in addition to learning from limited
labelled data points. Recently, strong data augmentation combined with
consistency-based semi-supervised learning methods have advanced the state of
the art in several SSL tasks. However, adapting these methods to the RE task
has been challenging due to the difficulty of data augmentation for RE. In this
work, we leverage the recent advances in controlled text generation to perform
high quality data augmentation for the RE task. We further introduce small but
significant changes to model architecture that allows for generation of more
training data by interpolating different data points in their latent space.
These data augmentations along with consistency training result in very
competitive results for semi-supervised relation extraction on four benchmark
datasets.Comment: Previously published at INTERPOLATE @ NeurIPS 2022 worksho
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
- …