2,649 research outputs found
Automated Fact Checking in the News Room
Fact checking is an essential task in journalism; its importance has been
highlighted due to recently increased concerns and efforts in combating
misinformation. In this paper, we present an automated fact-checking platform
which given a claim, it retrieves relevant textual evidence from a document
collection, predicts whether each piece of evidence supports or refutes the
claim, and returns a final verdict. We describe the architecture of the system
and the user interface, focusing on the choices made to improve its
user-friendliness and transparency. We conduct a user study of the
fact-checking platform in a journalistic setting: we integrated it with a
collection of news articles and provide an evaluation of the platform using
feedback from journalists in their workflow. We found that the predictions of
our platform were correct 58\% of the time, and 59\% of the returned evidence
was relevant
Pruning based Distance Sketches with Provable Guarantees on Random Graphs
Measuring the distances between vertices on graphs is one of the most
fundamental components in network analysis. Since finding shortest paths
requires traversing the graph, it is challenging to obtain distance information
on large graphs very quickly. In this work, we present a preprocessing
algorithm that is able to create landmark based distance sketches efficiently,
with strong theoretical guarantees. When evaluated on a diverse set of social
and information networks, our algorithm significantly improves over existing
approaches by reducing the number of landmarks stored, preprocessing time, or
stretch of the estimated distances.
On Erd\"{o}s-R\'{e}nyi graphs and random power law graphs with degree
distribution exponent , our algorithm outputs an exact distance
data structure with space between and
depending on the value of , where is the number of vertices. We
complement the algorithm with tight lower bounds for Erdos-Renyi graphs and the
case when is close to two.Comment: Full version for the conference paper to appear in The Web
Conference'1
Semantic Hilbert Space for Text Representation Learning
Capturing the meaning of sentences has long been a challenging task. Current
models tend to apply linear combinations of word features to conduct semantic
composition for bigger-granularity units e.g. phrases, sentences, and
documents. However, the semantic linearity does not always hold in human
language. For instance, the meaning of the phrase `ivory tower' can not be
deduced by linearly combining the meanings of `ivory' and `tower'. To address
this issue, we propose a new framework that models different levels of semantic
units (e.g. sememe, word, sentence, and semantic abstraction) on a single
\textit{Semantic Hilbert Space}, which naturally admits a non-linear semantic
composition by means of a complex-valued vector word representation. An
end-to-end neural network~\footnote{https://github.com/wabyking/qnn} is
proposed to implement the framework in the text classification task, and
evaluation results on six benchmarking text classification datasets demonstrate
the effectiveness, robustness and self-explanation power of the proposed model.
Furthermore, intuitive case studies are conducted to help end users to
understand how the framework works.Comment: accepted in WWW 201
Collaborative Recommendation Model Based on Multi-modal Multi-view Attention Network: Movie and literature cases
The existing collaborative recommendation models that use multi-modal
information emphasize the representation of users' preferences but easily
ignore the representation of users' dislikes. Nevertheless, modelling users'
dislikes facilitates comprehensively characterizing user profiles. Thus, the
representation of users' dislikes should be integrated into the user modelling
when we construct a collaborative recommendation model. In this paper, we
propose a novel Collaborative Recommendation Model based on Multi-modal
multi-view Attention Network (CRMMAN), in which the users are represented from
both preference and dislike views. Specifically, the users' historical
interactions are divided into positive and negative interactions, used to model
the user's preference and dislike views, respectively. Furthermore, the
semantic and structural information extracted from the scene is employed to
enrich the item representation. We validate CRMMAN by designing contrast
experiments based on two benchmark MovieLens-1M and Book-Crossing datasets.
Movielens-1m has about a million ratings, and Book-Crossing has about 300,000
ratings. Compared with the state-of-the-art knowledge-graph-based and
multi-modal recommendation methods, the AUC, NDCG@5 and NDCG@10 are improved by
2.08%, 2.20% and 2.26% on average of two datasets. We also conduct controlled
experiments to explore the effects of multi-modal information and multi-view
mechanism. The experimental results show that both of them enhance the model's
performance
Detecting Toxicity in News Articles: Application to Bulgarian
Online media aim for reaching ever bigger audience and for attracting ever
longer attention span. This competition creates an environment that rewards
sensational, fake, and toxic news. To help limit their spread and impact, we
propose and develop a news toxicity detector that can recognize various types
of toxic content. While previous research primarily focused on English, here we
target Bulgarian. We created a new dataset by crawling a website that for five
years has been collecting Bulgarian news articles that were manually
categorized into eight toxicity groups. Then we trained a multi-class
classifier with nine categories: eight toxic and one non-toxic. We experimented
with different representations based on ElMo, BERT, and XLM, as well as with a
variety of domain-specific features. Due to the small size of our dataset, we
created a separate model for each feature type, and we ultimately combined
these models into a meta-classifier. The evaluation results show an accuracy of
59.0% and a macro-F1 score of 39.7%, which represent sizable improvements over
the majority-class baseline (Acc=30.3%, macro-F1=5.2%).Comment: Fact-checking, source reliability, political ideology, news media,
Bulgarian, RANLP-2019. arXiv admin note: text overlap with arXiv:1810.0176
- …