216 research outputs found

    Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks

    Full text link
    Online media outlets, in a bid to expand their reach and subsequently increase revenue through ad monetisation, have begun adopting clickbait techniques to lure readers to click on articles. The article fails to fulfill the promise made by the headline. Traditional methods for clickbait detection have relied heavily on feature engineering which, in turn, is dependent on the dataset it is built for. The application of neural networks for this task has only been explored partially. We propose a novel approach considering all information found in a social media post. We train a bidirectional LSTM with an attention mechanism to learn the extent to which a word contributes to the post's clickbait score in a differential manner. We also employ a Siamese net to capture the similarity between source and target information. Information gleaned from images has not been considered in previous approaches. We learn image embeddings from large amounts of data using Convolutional Neural Networks to add another layer of complexity to our model. Finally, we concatenate the outputs from the three separate components, serving it as input to a fully connected layer. We conduct experiments over a test corpus of 19538 social media posts, attaining an F1 score of 65.37% on the dataset bettering the previous state-of-the-art, as well as other proposed approaches, feature engineering or otherwise.Comment: Accepted at SIGIR 2018 as Short Pape

    A Large-Scale Study of Phishing PDF Documents

    Full text link
    Phishing PDFs are malicious PDF documents that do not embed malware but trick victims into visiting malicious web pages leading to password theft or drive-by downloads. While recent reports indicate a surge of phishing PDFs, prior works have largely neglected this new threat, positioning phishing PDFs as accessories distributed via email phishing campaigns. This paper challenges this belief and presents the first systematic and comprehensive study centered on phishing PDFs. Starting from a real-world dataset, we first identify 44 phishing PDF campaigns via clustering and characterize them by looking at their volumetric, temporal, and visual features. Among these, we identify three large campaigns covering 89% of the dataset, exhibiting significantly different volumetric and temporal properties compared to classical email phishing, and relying on web UI elements as visual baits. Finally, we look at the distribution vectors and show that phishing PDFs are not only distributed via attachments but also via SEO attacks, placing phishing PDFs outside the email distribution ecosystem. This paper also assesses the usefulness of the VirusTotal scoring system, showing that phishing PDFs are ranked considerably low, creating a blind spot for organizations. While URL blocklists can help to prevent victims from visiting the attack web pages, PDF documents seem not subjected to any form of content-based filtering or detection

    The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans

    Full text link
    A new era of Information Warfare has arrived. Various actors, including state-sponsored ones, are weaponizing information on Online Social Networks to run false information campaigns with targeted manipulation of public opinion on specific topics. These false information campaigns can have dire consequences to the public: mutating their opinions and actions, especially with respect to critical world events like major elections. Evidently, the problem of false information on the Web is a crucial one, and needs increased public awareness, as well as immediate attention from law enforcement agencies, public institutions, and in particular, the research community. In this paper, we make a step in this direction by providing a typology of the Web's false information ecosystem, comprising various types of false information, actors, and their motives. We report a comprehensive overview of existing research on the false information ecosystem by identifying several lines of work: 1) how the public perceives false information; 2) understanding the propagation of false information; 3) detecting and containing false information on the Web; and 4) false information on the political stage. In this work, we pay particular attention to political false information as: 1) it can have dire consequences to the community (e.g., when election results are mutated) and 2) previous work show that this type of false information propagates faster and further when compared to other types of false information. Finally, for each of these lines of work, we report several future research directions that can help us better understand and mitigate the emerging problem of false information dissemination on the Web
    corecore