25,950 research outputs found
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data
There are threefold challenges in emotion recognition. First, it is difficult
to recognize human's emotional states only considering a single modality.
Second, it is expensive to manually annotate the emotional data. Third,
emotional data often suffers from missing modalities due to unforeseeable
sensor malfunction or configuration issues. In this paper, we address all these
problems under a novel multi-view deep generative framework. Specifically, we
propose to model the statistical relationships of multi-modality emotional data
using multiple modality-specific generative networks with a shared latent
space. By imposing a Gaussian mixture assumption on the posterior approximation
of the shared latent variables, our framework can learn the joint deep
representation from multiple modalities and evaluate the importance of each
modality simultaneously. To solve the labeled-data-scarcity problem, we extend
our multi-view model to semi-supervised learning scenario by casting the
semi-supervised classification problem as a specialized missing data imputation
task. To address the missing-modality problem, we further extend our
semi-supervised multi-view model to deal with incomplete data, where a missing
view is treated as a latent variable and integrated out during inference. This
way, the proposed overall framework can utilize all available (both labeled and
unlabeled, as well as both complete and incomplete) data to improve its
generalization ability. The experiments conducted on two real multi-modal
emotion datasets demonstrated the superiority of our framework.Comment: arXiv admin note: text overlap with arXiv:1704.07548, 2018 ACM
Multimedia Conference (MM'18
Structure fusion based on graph convolutional networks for semi-supervised classification
Suffering from the multi-view data diversity and complexity for
semi-supervised classification, most of existing graph convolutional networks
focus on the networks architecture construction or the salient graph structure
preservation, and ignore the the complete graph structure for semi-supervised
classification contribution. To mine the more complete distribution structure
from multi-view data with the consideration of the specificity and the
commonality, we propose structure fusion based on graph convolutional networks
(SF-GCN) for improving the performance of semi-supervised classification.
SF-GCN can not only retain the special characteristic of each view data by
spectral embedding, but also capture the common style of multi-view data by
distance metric between multi-graph structures. Suppose the linear relationship
between multi-graph structures, we can construct the optimization function of
structure fusion model by balancing the specificity loss and the commonality
loss. By solving this function, we can simultaneously obtain the fusion
spectral embedding from the multi-view data and the fusion structure as
adjacent matrix to input graph convolutional networks for semi-supervised
classification. Experiments demonstrate that the performance of SF-GCN
outperforms that of the state of the arts on three challenging datasets, which
are Cora,Citeseer and Pubmed in citation networks
- …