899 research outputs found

    Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter

    Get PDF
    Over the past few years, online bullying and aggression have become increasingly prominent, and manifested in many different forms on social media. However, there is little work analyzing the characteristics of abusive users and what distinguishes them from typical social media users. In this paper, we start addressing this gap by analyzing tweets containing a great large amount of abusiveness. We focus on a Twitter dataset revolving around the Gamergate controversy, which led to many incidents of cyberbullying and cyberaggression on various gaming and social media platforms. We study the properties of the users tweeting about Gamergate, the content they post, and the differences in their behavior compared to typical Twitter users. We find that while their tweets are often seemingly about aggressive and hateful subjects, "Gamergaters" do not exhibit common expressions of online anger, and in fact primarily differ from typical users in that their tweets are less joyful. They are also more engaged than typical Twitter users, which is an indication as to how and why this controversy is still ongoing. Surprisingly, we find that Gamergaters are less likely to be suspended by Twitter, thus we analyze their properties to identify differences from typical users and what may have led to their suspension. We perform an unsupervised machine learning analysis to detect clusters of users who, though currently active, could be considered for suspension since they exhibit similar behaviors with suspended users. Finally, we confirm the usefulness of our analyzed features by emulating the Twitter suspension mechanism with a supervised learning method, achieving very good precision and recall.Comment: In 28th ACM Conference on Hypertext and Social Media (ACM HyperText 2017

    Data Sets: Word Embeddings Learned from Tweets and General Data

    Full text link
    A word embedding is a low-dimensional, dense and real- valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually gener- ated from a large text corpus. The embedding of a word cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general text. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks

    Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams

    Full text link
    Online social media are complementing and in some cases replacing person-to-person social interaction and redefining the diffusion of information. In particular, microblogs have become crucial grounds on which public relations, marketing, and political battles are fought. We introduce an extensible framework that will enable the real-time analysis of meme diffusion in social media by mining, visualizing, mapping, classifying, and modeling massive streams of public microblogging events. We describe a Web service that leverages this framework to track political memes in Twitter and help detect astroturfing, smear campaigns, and other misinformation in the context of U.S. political elections. We present some cases of abusive behaviors uncovered by our service. Finally, we discuss promising preliminary results on the detection of suspicious memes via supervised learning based on features extracted from the topology of the diffusion networks, sentiment analysis, and crowdsourced annotations

    A Survey on Cybercrime Using Social Media

    Get PDF
    There is growing interest in automating crime detection and prevention for large populations as a result of the increased usage of social media for victimization and criminal activities. This area is frequently researched due to its potential for enabling criminals to reach a large audience. While several studies have investigated specific crimes on social media, a comprehensive review paper that examines all types of social media crimes, their similarities, and detection methods is still lacking. The identification of similarities among crimes and detection methods can facilitate knowledge and data transfer across domains. The goal of this study is to collect a library of social media crimes and establish their connections using a crime taxonomy. The survey also identifies publicly accessible datasets and offers areas for additional study in this area

    Seminar Users in the Arabic Twitter Sphere

    Full text link
    We introduce the notion of "seminar users", who are social media users engaged in propaganda in support of a political entity. We develop a framework that can identify such users with 84.4% precision and 76.1% recall. While our dataset is from the Arab region, omitting language-specific features has only a minor impact on classification performance, and thus, our approach could work for detecting seminar users in other parts of the world and in other languages. We further explored a controversial political topic to observe the prevalence and potential potency of such users. In our case study, we found that 25% of the users engaged in the topic are in fact seminar users and their tweets make nearly a third of the on-topic tweets. Moreover, they are often successful in affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201

    STREAM-EVOLVING BOT DETECTION FRAMEWORK USING GRAPH-BASED AND FEATURE-BASED APPROACHES FOR IDENTIFYING SOCIAL BOTS ON TWITTER

    Get PDF
    This dissertation focuses on the problem of evolving social bots in online social networks, particularly Twitter. Such accounts spread misinformation and inflate social network content to mislead the masses. The main objective of this dissertation is to propose a stream-based evolving bot detection framework (SEBD), which was constructed using both graph- and feature-based models. It was built using Python, a real-time streaming engine (Apache Kafka version 3.2), and our pretrained model (bot multi-view graph attention network (Bot-MGAT)). The feature-based model was used to identify predictive features for bot detection and evaluate the SEBD predictions. The graph-based model was used to facilitate multiview graph attention networks (GATs) with fellowship links to build our framework for predicting account labels from streams. A probably approximately correct learning framework was applied to confirm the accuracy and confidence levels of SEBD.The results showed that the SEBD can effectively identify bots from streams and profile features are sufficient for detecting social bots. The pretrained Bot-MGAT model uses fellowship links to reveal hidden information that can aid in identifying bot accounts. The significant contributions of this study are the development of a stream based bot detection framework for detecting social bots based on a given hashtag and the proposal of a hybrid approach for feature selection to identify predictive features for identifying bot accounts. Our findings indicate that Twitter has a higher percentage of active bots than humans in hashtags. The results indicated that stream-based detection is more effective than offline detection by achieving accuracy score 96.9%. Finally, semi supervised learning (SSL) can solve the issue of labeled data in bot detection tasks

    Enhanced Spam Detection System for Twitter Social Networking Platform

    Get PDF
    Twitter social site is one of the most popular Online Social Networking Site (OSN) used by popular people such as Ministers, businessman, large companies, actors to share their information. In this site, around 500 million of tweets are posted monthly by the total 313 million Twitter active users. The widespread of Twitter has drawn the interest of spammers. These malicious actors exploit the platform for various nefarious purposes, including monitoring authentic users, disseminating harmful software, and promoting their agendas through URLs embedded in tweets. They engage in tactics like secret following and unfollowing legitimate users, all with the intent of gathering sensitive information.To resolve this problem, a secure spam detection based on machine learning approach is designed. The designed used stop word removal, word to vector model to refined and dimensionally reduced the data. To enhance the quality of the data Cosine similarity is also been applied to measure the similarity score among the tweets and based upon that Artificial Neural Network (ANN) is trained. Later on, it is used to test the efficiency by examining the performance parameters in terms of precision, recall and F-measure. Also, the comparative analysis has been performed to present the efficiency of the work. The average precision, recall and F measure of proposed spam detection model of 0.9252, 0.6107 and 0.734 are obtained

    지리적 거리 정보를 활용한 가짜 팔로워 구매자 식별 방법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 김종권.The reputation of social media such as Twitter, Facebook, and Instagram now regard as one persons power in real-world. The person who has more friends or followers can influence more individuals. So the influence of users is associated with the number of friends or followers. On the demand of increasing social power, an underground market has emerged where a customer can buy fake followers. The one who purchase fake followers acts vigorously in online social network. Thus, it is hard to distinguish customer from celebrity or cyberstar. Nevertheless, there are unique characteristics of legitimate users that customers or fake followers cannot manipulate such as a small-world property. The small-world property is mainly qualified by the shortest-path and clustering coefficient. In the small-world network, most people are linked by short chains. Existing work has largely focused on extracting relationship features such as indegree, outdegree, status, hub, or authority. Even though these research explored the relationship features to classify abnormal users of fake follower markets, research that utilize the small-world property to detect abnormal users is not studied. In this work, we propose a model that adapt the small-world property. Specifically, we study the geographical distance for 1hop-directional links using nodes geographical location to verify whether a social graph has the small-world property or not. Motivated by the difference of distance ratio for 1hop directional links, we propose a method which is designed to generate 1hop link distance ratio and classify a node as a customer or not. Experimental results on real-world Twitter dataset demonstrates that the proposed method achieves higher performance than existing models.Chapter 1 Introduction 1 1.1 Motivations 1 1.2 Fake Follower Markets 3 1.3 Research Objectives 5 1.4 Contributions 6 1.5 Thesis Organization 8 Chapter 2 Related Work 10 2.1 Small World Phenomenon 10 2.2 Online Social Abusing Attack Detection 11 2.2.1 Contents-based Detection 12 2.2.2 Social Network-based Detection 13 2.2.3 Behavior-based Detection 5 Chapter 3 Characteristic of Customers and Fake Followers 16 3.1 Data Preparation 16 3.2 Fake Follower Properties 21 3.3 Customer Properties 26 Chapter 4 Social Relationship and Geographical Distance 29 4.1 Geographical Distance in OSNs 29 4.2 Follower Ratio 34 Chapter 5 Detecting Customers 38 5.1 Key Features for Customer Detection 38 5.2 Performance matrices 40 5.3 Experiments 41 5.4 Comparison with Baseline Method 44 5.5 Comparison with Feature-based Method 47 5.6 Impact of Balanced Dataset 49 5.7 Fake Follower Detection 50 Chapter 6 Future Work 52 6.1 The Absence of Location Information 52 6.2 Hybrid Detection Method with Link Ratio and Profile Information 54 Chapter 7 Conclusion 56 Bibliography 58 국문초록 69Docto
    corecore