20,790 research outputs found
Negative Link Prediction in Social Media
Signed network analysis has attracted increasing attention in recent years.
This is in part because research on signed network analysis suggests that
negative links have added value in the analytical process. A major impediment
in their effective use is that most social media sites do not enable users to
specify them explicitly. In other words, a gap exists between the importance of
negative links and their availability in real data sets. Therefore, it is
natural to explore whether one can predict negative links automatically from
the commonly available social network data. In this paper, we investigate the
novel problem of negative link prediction with only positive links and
content-centric interactions in social media. We make a number of important
observations about negative links, and propose a principled framework NeLP,
which can exploit positive links and content-centric interactions to predict
negative links. Our experimental results on real-world social networks
demonstrate that the proposed NeLP framework can accurately predict negative
links with positive links and content-centric interactions. Our detailed
experiments also illustrate the relative importance of various factors to the
effectiveness of the proposed framework
Neural Relation Extraction Within and Across Sentence Boundaries
Past work in relation extraction mostly focuses on binary relation between
entity pairs within single sentence. Recently, the NLP community has gained
interest in relation extraction in entity pairs spanning multiple sentences. In
this paper, we propose a novel architecture for this task: inter-sentential
dependency-based neural networks (iDepNN). iDepNN models the shortest and
augmented dependency paths via recurrent and recursive neural networks to
extract relationships within (intra-) and across (inter-) sentence boundaries.
Compared to SVM and neural network baselines, iDepNN is more robust to false
positives in relationships spanning sentences.
We evaluate our models on four datasets from newswire (MUC6) and medical
(BioNLP shared task) domains that achieve state-of-the-art performance and show
a better balance in precision and recall for inter-sentential relationships. We
perform better than 11 teams participating in the BioNLP shared task 2016 and
achieve a gain of 5.2% (0.587 vs 0.558) in F1 over the winning team. We also
release the crosssentence annotations for MUC6.Comment: AAAI201
Finding Streams in Knowledge Graphs to Support Fact Checking
The volume and velocity of information that gets generated online limits
current journalistic practices to fact-check claims at the same rate.
Computational approaches for fact checking may be the key to help mitigate the
risks of massive misinformation spread. Such approaches can be designed to not
only be scalable and effective at assessing veracity of dubious claims, but
also to boost a human fact checker's productivity by surfacing relevant facts
and patterns to aid their analysis. To this end, we present a novel,
unsupervised network-flow based approach to determine the truthfulness of a
statement of fact expressed in the form of a (subject, predicate, object)
triple. We view a knowledge graph of background information about real-world
entities as a flow network, and knowledge as a fluid, abstract commodity. We
show that computational fact checking of such a triple then amounts to finding
a "knowledge stream" that emanates from the subject node and flows toward the
object node through paths connecting them. Evaluation on a range of real-world
and hand-crafted datasets of facts related to entertainment, business, sports,
geography and more reveals that this network-flow model can be very effective
in discerning true statements from false ones, outperforming existing
algorithms on many test cases. Moreover, the model is expressive in its ability
to automatically discover several useful path patterns and surface relevant
facts that may help a human fact checker corroborate or refute a claim.Comment: Extended version of the paper in proceedings of ICDM 201
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
- …