18,132 research outputs found
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
This paper proposes a novel approach for relation extraction from free text
which is trained to jointly use information from the text and from existing
knowledge. Our model is based on two scoring functions that operate by learning
low-dimensional embeddings of words and of entities and relationships from a
knowledge base. We empirically show on New York Times articles aligned with
Freebase relations that our approach is able to efficiently use the extra
information provided by a large subset of Freebase data (4M entities, 23k
relationships) to improve over existing methods that rely on text features
alone
Weakly-supervised learning of visual relations
This paper introduces a novel approach for modeling visual relations between
pairs of objects. We call relation a triplet of the form (subject, predicate,
object) where the predicate is typically a preposition (eg. 'under', 'in front
of') or a verb ('hold', 'ride') that links a pair of objects (subject, object).
Learning such relations is challenging as the objects have different spatial
configurations and appearances depending on the relation in which they occur.
Another major challenge comes from the difficulty to get annotations,
especially at box-level, for all possible triplets, which makes both learning
and evaluation difficult. The contributions of this paper are threefold. First,
we design strong yet flexible visual features that encode the appearance and
spatial configuration for pairs of objects. Second, we propose a
weakly-supervised discriminative clustering model to learn relations from
image-level labels only. Third we introduce a new challenging dataset of
unusual relations (UnRel) together with an exhaustive annotation, that enables
accurate evaluation of visual relation retrieval. We show experimentally that
our model results in state-of-the-art results on the visual relationship
dataset significantly improving performance on previously unseen relations
(zero-shot learning), and confirm this observation on our newly introduced
UnRel dataset
Learning to detect chest radiographs containing lung nodules using visual attention networks
Machine learning approaches hold great potential for the automated detection
of lung nodules in chest radiographs, but training the algorithms requires vary
large amounts of manually annotated images, which are difficult to obtain. Weak
labels indicating whether a radiograph is likely to contain pulmonary nodules
are typically easier to obtain at scale by parsing historical free-text
radiological reports associated to the radiographs. Using a repositotory of
over 700,000 chest radiographs, in this study we demonstrate that promising
nodule detection performance can be achieved using weak labels through
convolutional neural networks for radiograph classification. We propose two
network architectures for the classification of images likely to contain
pulmonary nodules using both weak labels and manually-delineated bounding
boxes, when these are available. Annotated nodules are used at training time to
deliver a visual attention mechanism informing the model about its localisation
performance. The first architecture extracts saliency maps from high-level
convolutional layers and compares the estimated position of a nodule against
the ground truth, when this is available. A corresponding localisation error is
then back-propagated along with the softmax classification error. The second
approach consists of a recurrent attention model that learns to observe a short
sequence of smaller image portions through reinforcement learning. When a
nodule annotation is available at training time, the reward function is
modified accordingly so that exploring portions of the radiographs away from a
nodule incurs a larger penalty. Our empirical results demonstrate the potential
advantages of these architectures in comparison to competing methodologies
- …