216 research outputs found
Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks
Online media outlets, in a bid to expand their reach and subsequently
increase revenue through ad monetisation, have begun adopting clickbait
techniques to lure readers to click on articles. The article fails to fulfill
the promise made by the headline. Traditional methods for clickbait detection
have relied heavily on feature engineering which, in turn, is dependent on the
dataset it is built for. The application of neural networks for this task has
only been explored partially. We propose a novel approach considering all
information found in a social media post. We train a bidirectional LSTM with an
attention mechanism to learn the extent to which a word contributes to the
post's clickbait score in a differential manner. We also employ a Siamese net
to capture the similarity between source and target information. Information
gleaned from images has not been considered in previous approaches. We learn
image embeddings from large amounts of data using Convolutional Neural Networks
to add another layer of complexity to our model. Finally, we concatenate the
outputs from the three separate components, serving it as input to a fully
connected layer. We conduct experiments over a test corpus of 19538 social
media posts, attaining an F1 score of 65.37% on the dataset bettering the
previous state-of-the-art, as well as other proposed approaches, feature
engineering or otherwise.Comment: Accepted at SIGIR 2018 as Short Pape
A Large-Scale Study of Phishing PDF Documents
Phishing PDFs are malicious PDF documents that do not embed malware but trick
victims into visiting malicious web pages leading to password theft or drive-by
downloads. While recent reports indicate a surge of phishing PDFs, prior works
have largely neglected this new threat, positioning phishing PDFs as
accessories distributed via email phishing campaigns.
This paper challenges this belief and presents the first systematic and
comprehensive study centered on phishing PDFs. Starting from a real-world
dataset, we first identify 44 phishing PDF campaigns via clustering and
characterize them by looking at their volumetric, temporal, and visual
features. Among these, we identify three large campaigns covering 89% of the
dataset, exhibiting significantly different volumetric and temporal properties
compared to classical email phishing, and relying on web UI elements as visual
baits. Finally, we look at the distribution vectors and show that phishing PDFs
are not only distributed via attachments but also via SEO attacks, placing
phishing PDFs outside the email distribution ecosystem.
This paper also assesses the usefulness of the VirusTotal scoring system,
showing that phishing PDFs are ranked considerably low, creating a blind spot
for organizations. While URL blocklists can help to prevent victims from
visiting the attack web pages, PDF documents seem not subjected to any form of
content-based filtering or detection
The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans
A new era of Information Warfare has arrived. Various actors, including
state-sponsored ones, are weaponizing information on Online Social Networks to
run false information campaigns with targeted manipulation of public opinion on
specific topics. These false information campaigns can have dire consequences
to the public: mutating their opinions and actions, especially with respect to
critical world events like major elections. Evidently, the problem of false
information on the Web is a crucial one, and needs increased public awareness,
as well as immediate attention from law enforcement agencies, public
institutions, and in particular, the research community. In this paper, we make
a step in this direction by providing a typology of the Web's false information
ecosystem, comprising various types of false information, actors, and their
motives. We report a comprehensive overview of existing research on the false
information ecosystem by identifying several lines of work: 1) how the public
perceives false information; 2) understanding the propagation of false
information; 3) detecting and containing false information on the Web; and 4)
false information on the political stage. In this work, we pay particular
attention to political false information as: 1) it can have dire consequences
to the community (e.g., when election results are mutated) and 2) previous work
show that this type of false information propagates faster and further when
compared to other types of false information. Finally, for each of these lines
of work, we report several future research directions that can help us better
understand and mitigate the emerging problem of false information dissemination
on the Web
- …