109 research outputs found
Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection
In this paper, we describe our submission to SemEval-2019 Task 4 on
Hyperpartisan News Detection. Our system relies on a variety of engineered
features originally used to detect propaganda. This is based on the assumption
that biased messages are propagandistic in the sense that they promote a
particular political cause or viewpoint. We trained a logistic regression model
with features ranging from simple bag-of-words to vocabulary richness and text
readability features. Our system achieved 72.9% accuracy on the test data that
is annotated manually and 60.8% on the test data that is annotated with distant
supervision. Additional experiments showed that significant performance
improvements can be achieved with better feature pre-processing.Comment: Hyperpartisanship, propaganda, news media, fake news, SemEval-201
Team Fernando-Pessa at SemEval-2019 Task 4: back to basics in Hyperpartisan News Detection
This paper describes our submission1 to the
SemEval 2019 Hyperpartisan News Detection
task. Our system aims for a linguistics-based
document classification from a minimal set
of interpretable features, while maintaining
good performance. To this goal, we follow
a feature-based approach and perform several
experiments with different machine learning
classifiers. On the main task, our model
achieved an accuracy of 71.7%, which was
improved after the task's end to 72.9%. We
also participate in the meta-learning sub-task,
for classifying documents with the binary classifications
of all submitted systems as input,
achieving an accuracy of 89.9%
CIMTDetect: A Community Infused Matrix-Tensor Coupled Factorization Based Method for Fake News Detection
Detecting whether a news article is fake or genuine is a crucial task in
today's digital world where it's easy to create and spread a misleading news
article. This is especially true of news stories shared on social media since
they don't undergo any stringent journalistic checking associated with main
stream media. Given the inherent human tendency to share information with their
social connections at a mouse-click, fake news articles masquerading as real
ones, tend to spread widely and virally. The presence of echo chambers (people
sharing same beliefs) in social networks, only adds to this problem of
wide-spread existence of fake news on social media. In this paper, we tackle
the problem of fake news detection from social media by exploiting the very
presence of echo chambers that exist within the social network of users to
obtain an efficient and informative latent representation of the news article.
By modeling the echo-chambers as closely-connected communities within the
social network, we represent a news article as a 3-mode tensor of the structure
- and propose a tensor factorization based method to
encode the news article in a latent embedding space preserving the community
structure. We also propose an extension of the above method, which jointly
models the community and content information of the news article through a
coupled matrix-tensor factorization framework. We empirically demonstrate the
efficacy of our method for the task of Fake News Detection over two real-world
datasets. Further, we validate the generalization of the resulting embeddings
over two other auxiliary tasks, namely: \textbf{1)} News Cohort Analysis and
\textbf{2)} Collaborative News Recommendation. Our proposed method outperforms
appropriate baselines for both the tasks, establishing its generalization.Comment: Presented at ASONAM'1
Language-independent fake news detection: English, Portuguese, and Spanish mutual features
Online Social Media (OSM) have been substantially transforming the process of spreading news, improving its speed, and reducing barriers toward reaching out to a broad audience. However, OSM are very limited in providing mechanisms to check the credibility of news propagated through their structure. The majority of studies on automatic fake news detection are restricted to English documents, with few works evaluating other languages, and none comparing language-independent characteristics. Moreover, the spreading of deceptive news tends to be a worldwide problem; therefore, this work evaluates textual features that are not tied to a specific language when describing textual data for detecting news. Corpora of news written in American English, Brazilian Portuguese, and Spanish were explored to study complexity, stylometric, and psychological text features. The extracted features support the detection of fake, legitimate, and satirical news. We compared four machine learning algorithms (k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB)) to induce the detection model. Results show our proposed language-independent features are successful in describing fake, satirical, and legitimate news across three different languages, with an average detection accuracy of 85.3% with RF
A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian and Indian Elections
Recently, messaging applications, such as WhatsApp, have been reportedly
abused by misinformation campaigns, especially in Brazil and India. A notable
form of abuse in WhatsApp relies on several manipulated images and memes
containing all kinds of fake stories. In this work, we performed an extensive
data collection from a large set of WhatsApp publicly accessible groups and
fact-checking agency websites. This paper opens a novel dataset to the research
community containing fact-checked fake images shared through WhatsApp for two
distinct scenarios known for the spread of fake news on the platform: the 2018
Brazilian elections and the 2019 Indian elections.Comment: 7 pages. This is a preprint version of an accepted paper on ICWSM'20.
Please, consider to cite the conference version instead of this on
A Topic-Agnostic Approach for Identifying Fake News Pages
Fake news and misinformation have been increasingly used to manipulate
popular opinion and influence political processes. To better understand fake
news, how they are propagated, and how to counter their effect, it is necessary
to first identify them. Recently, approaches have been proposed to
automatically classify articles as fake based on their content. An important
challenge for these approaches comes from the dynamic nature of news: as new
political events are covered, topics and discourse constantly change and thus,
a classifier trained using content from articles published at a given time is
likely to become ineffective in the future. To address this challenge, we
propose a topic-agnostic (TAG) classification strategy that uses linguistic and
web-markup features to identify fake news pages. We report experimental results
using multiple data sets which show that our approach attains high accuracy in
the identification of fake news, even as topics evolve over time.Comment: Accepted for publication in the Companion Proceedings of the 2019
World Wide Web Conference (WWW'19 Companion). Presented in the 2019
International Workshop on Misinformation, Computational Fact-Checking and
Credible Web (MisinfoWorkshop2019). 6 page
- …