31 research outputs found

    Online Deception Detection Refueled by Real World Data Collection

    Full text link
    The lack of large realistic datasets presents a bottleneck in online deception detection studies. In this paper, we apply a data collection method based on social network analysis to quickly identify high-quality deceptive and truthful online reviews from Amazon. The dataset contains more than 10,000 deceptive reviews and is diverse in product domains and reviewers. Using this dataset, we explore effective general features for online deception detection that perform well across domains. We demonstrate that with generalized features - advertising speak and writing complexity scores - deception detection performance can be further improved by adding additional deceptive reviews from assorted domains in training. Finally, reviewer level evaluation gives an interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing (RANLP) 201

    An Evaluation of Text Representation Techniques for Fake News Detection Using: TF-IDF, Word Embeddings, Sentence Embeddings with Linear Support Vector Machine.

    Get PDF
    In a world where anybody can share their views, opinions and make it sound like these are facts about the current situation of the world, Fake News poses a huge threat especially to the reputation of people with high stature and to organizations. In the political world, this could lead to opposition parties making use of this opportunity to gain popularity in their elections. In the medical world, a fake scandalous message about a medicine giving side effects, hospital treatment gone wrong or even a false message against a practicing doctor could become a big menace to everyone involved in that news. In the world of business, one false news becoming a trending topic could definitely disrupt their future business earnings. The detection of such false news becomes very important in today’s world, where almost everyone has an access to use a mobile phone and can cause enough disruption by creating one false statement and making it a viral hit. Generation of fake news articles gathered more attention during the US Presidential Elections in 2016, leading to a high number of scientists and researchers to explore this NLP problem with deep interest and a sense of urgency too. This research intends to develop and compare a Fake News classifier using Linear Support Vector Machine Classifier built on traditional text feature representation technique Term Frequency Inverse Document Frequency (Ahmed, Traore & Saad, 2017), against a classifier built on the latest developments for text feature representations such as: word embeddings using ‘word2vec’ and sentence embeddings using ‘Universal Sentence Encoder’

    TI-CNN: Convolutional Neural Networks for Fake News Detection

    Full text link
    With the development of social networks, fake news for various commercial and political purposes has been appearing in large numbers and gotten widespread in the online world. With deceptive words, people can get infected by the fake news very easily and will share them without any fact-checking. For instance, during the 2016 US president election, various kinds of fake news about the candidates widely spread through both official news media and the online social networks. These fake news is usually released to either smear the opponents or support the candidate on their side. The erroneous information in the fake news is usually written to motivate the voters' irrational emotion and enthusiasm. Such kinds of fake news sometimes can bring about devastating effects, and an important goal in improving the credibility of online social networks is to identify the fake news timely. In this paper, we propose to study the fake news detection problem. Automatic fake news identification is extremely hard, since pure model based fact-checking for news is still an open problem, and few existing models can be applied to solve the problem. With a thorough investigation of a fake news data, lots of useful explicit features are identified from both the text words and images used in the fake news. Besides the explicit features, there also exist some hidden patterns in the words and images used in fake news, which can be captured with a set of latent features extracted via the multiple convolutional layers in our model. A model named as TI-CNN (Text and Image information based Convolutinal Neural Network) is proposed in this paper. By projecting the explicit and latent features into a unified feature space, TI-CNN is trained with both the text and image information simultaneously. Extensive experiments carried on the real-world fake news datasets have demonstrate the effectiveness of TI-CNN

    Crowd and AI Powered Manipulation: Characterization and Detection

    Get PDF
    User reviews are ubiquitous. They power online review aggregators that influence our daily-based decisions, from what products to purchase (e.g., Amazon), movies to view (e.g., Netflix, HBO, Hulu), restaurants to patronize (e.g., Yelp), and hotels to book (e.g., TripAdvisor, Airbnb). In addition, policy makers rely on online commenting platforms like Regulations.gov and FCC.gov as a means for citizens to voice their opinions about public policy issues. However, showcasing the opinions of fellow users has a dark side as these reviews and comments are vulnerable to manipulation. And as advances in AI continue, fake reviews generated by AI agents rather than users pose even more scalable and dangerous manipulation attacks. These attacks on online discourse can sway ratings of products, manipulate opinions and perceived support of key issues, and degrade our trust in online platforms. Previous efforts have mainly focused on highly visible anomaly behaviors captured by statistical modeling or clustering algorithms. While detection of such anomalous behaviors helps to improve the reliability of online interactions, it misses subtle and difficult-to-detect behaviors. This research investigates two major research thrusts centered around manipulation strategies. In the first thrust, we study crowd-based manipulation strategies wherein crowds of paid workers organize to spread fake reviews. In the second thrust, we explore AI-based manipulation strategies, where crowd workers are replaced by scalable, and potentially undetectable generative models of fake reviews. In particular, one of the key aspects of this work is to address the research gap in previous efforts for anomaly detection where ground truth data is missing (and hence, evaluation can be challenging). In addition, this work studies the capabilities and impact of model-based attacks as the next generation of online threats. We propose inter-related methods for collecting evidence of these attacks, and create new countermeasures for defending against them. The performance of proposed methods are compared against other state-of-the-art approaches in the literature. We find that although crowd campaigns do not show obvious anomaly behavior, they can be detected given a careful formulation of their behaviors. And, although model-generated fake reviews may appear on the surface to be legitimate, we find that they do not completely mimic the underlying distribution of human-written reviews, so we can leverage this signal to detect them

    Detection of spam review on mobile app stores, evaluation of helpfulness of user reviews and extraction of quality aspects using machine learning techniques

    Get PDF
    As mobile devices have overtaken fixed Internet access, mobile applications and distribution platforms have gained in importance. App stores enable users to search and purchase mobile applications and then to give feedback in the form of reviews and ratings. A review might contain critical information about user experience, feature requests and bug reports. User reviews are valuable not only to developers and software organizations interested in learning the opinion of their customers but also to prospective users who would like to find out what others think about an app. Even though some surveys have inventoried techniques and methods in opinion mining and sentiment analysis, no systematic literature review (SLR) study had yet reported on mobile app store opinion mining and spam review detection problems. Mining opinions from app store reviews requires pre-processing at the text and content levels, including filtering-out nonopinionated content and evaluating trustworthiness and genuineness of the reviews. In addition, the relevance of the extracted features are not cross-validated with main software engineering concepts. This research project first conducted a systematic literature review (SLR) on the evaluation of mobile app store opinion mining studies. Next, to fill the identified gaps in the literature, we used a novel convolutional neural network to learn document representation for deceptive spam review detection by characterizing an app store review dataset which includes truthful and spam reviews for the first time in the literature. Our experiments reported that our neural network based method achieved 82.5% accuracy, while a baseline Support Vector Machine (SVM) classification model reached only 70% accuracy despite leveraging various feature combinations. We next compared four classification models to assess app store user review helpfulness and proposed a predictive model which makes use of review meta-data along with structural and lexical features for helpfulness prediction. In the last part of this research study, we constructed an annotated app store review dataset for the aspect extraction task, based on ISO 25010 - Systems and software Product Quality Requirements and Evaluation standard and two deep neural network models: Bi-directional Long-Short Term Memory and Conditional Random Field (Bi-LSTM+CRF) and Deep Convolutional Neural Networks and Conditional Random Field (CNN+CRF) for aspect extraction from app store user reviews. Both models achieved nearly 80% F1 score (the weighted average of precision and recall which takes both false positives and false negatives into account) in exact aspect matching and 86% F1 score in partial aspect matching

    Towards Improving Generalization of Multi-Task Learning

    Full text link
    Multi-task Learning (MTL), which involves the simultaneous learning of multiple tasks, can achieve better performance than learning each task independently. It has achieved great success in various applications, ranging from computer vision to bioinformatics. However, involving multiple tasks in a single learning process is complicated, for both cooperation and competition exist across the including tasks; furthermore, the cooperation boosts the generalization of MTL while the competition degenerates it. There lacks of a systematic study on how to improve MTL's generalization by handling the cooperation and competition. This thesis systematically studies this problem and proposed four novel MTL methods to enhance the between-task cooperation or reduce the between-task competition. Specifically, for the between-task cooperation, adversarial multi-task representation learning (AMTRL) and semi-supervised multi-task learning (Semi-MTL) are studied; furthermore, a novel adaptive AMTRL method and a novel representation consistency regularization-based Semi-MTL method are proposed respectively. As to the between-task competition, this thesis analyzes the task variance and task imbalance; furthermore, a novel task variance regularization-based MTL method and a novel task-imbalance-aware MTL method are proposed respectively. The above proposed methods can improve the generalization of MTL and achieve state-of-the-art performance in real-word MTL applications

    Explainable NLP for Human-AI Collaboration

    Get PDF
    With more data and computing resources available these days, we have seen many novel Natural Language Processing (NLP) models breaking one performance record after another. Some of them even outperform human performance in some specific tasks. Meanwhile, many researchers have revealed weaknesses and irrationality of such models, e.g., having biases against some sub-populations, producing inconsistent predictions, and failing to work effectively in the wild due to overfitting. Therefore, in real applications, especially in high-stakes domains, humans cannot rely carelessly on predictions of NLP models, but they need to work closely with the models to ensure that every final decision made is accurate and benevolent. In this thesis, we devise and utilize explainable NLP techniques to support human-AI collaboration using text classification as a target task. Overall, our contributions can be divided into three main parts. First, we study how useful explanations are for humans according to three different purposes: revealing model behavior, justifying model predictions, and helping humans investigate uncertain predictions. Second, we propose a framework that enables humans to debug simple deep text classifiers informed by model explanations. Third, leveraging on computational argumentation, we develop a novel local explanation method for pattern-based logistic regression models that align better with human judgements and effectively assist humans to perform an unfamiliar task in real-time. Altogether, our contributions are paving the way towards the synergy of profound knowledge of human users and the tireless power of AI machines.Open Acces

    Comparison mining from text

    Get PDF
    corecore