434 research outputs found

    Organized Behavior Classification of Tweet Sets using Supervised Learning Methods

    Full text link
    During the 2016 US elections Twitter experienced unprecedented levels of propaganda and fake news through the collaboration of bots and hired persons, the ramifications of which are still being debated. This work proposes an approach to identify the presence of organized behavior in tweets. The Random Forest, Support Vector Machine, and Logistic Regression algorithms are each used to train a model with a data set of 850 records consisting of 299 features extracted from tweets gathered during the 2016 US presidential election. The features represent user and temporal synchronization characteristics to capture coordinated behavior. These models are trained to classify tweet sets among the categories: organic vs organized, political vs non-political, and pro-Trump vs pro-Hillary vs neither. The random forest algorithm performs better with greater than 95% average accuracy and f-measure scores for each category. The most valuable features for classification are identified as user based features, with media use and marking tweets as favorite to be the most dominant.Comment: 51 pages, 5 figure

    A Comparison of Retweet Prediction Approaches: The Superiority of Random Forest Learning Method

    Get PDF
    We consider the following retweet prediction task: given a tweet, predict whether it will be retweeted. In the past, a wide range of learning methods and features has been proposed for this task. We provide a systematic comparison of the performance of these learning methods and features in terms of prediction accuracy and feature importance. Specifically, from each previously published approach we take the best performing features and group these into two sets: user features and tweet features. In addition, we contrast five learning methods, both linear and non-linear. On top of that, we examine the added value of a previously proposed time-sensitive modeling approach. To the authors’ knowledge this is the first attempt to collect best performing features and contrast linear and non-linear learning methods. We perform our comparisons on a single dataset and find that user features such as the number of times a user is listed, number of followers, and average number of tweets published per day most strongly contribute to prediction accuracy across selected learning methods. We also find that a random forest-based learning, which has not been employed in previous studies, achieves the highest performance among the learning methods we consider. We also find that on top of properly tuned learning methods the benefits of time-sensitive modeling are very limited

    Hybrid feature selection approach to identify optimal features of profile metadata to detect social bots in Twitter

    Get PDF
    The last few years have revealed that social bots in social networks have become more sophisticated in design as they adapt their features to avoid detection systems. The deceptive nature of bots to mimic human users is due to the advancement of artificial intelligence and chatbots, where these bots learn and adjust very quickly. Therefore, finding the optimal features needed to detect them is an area for further investigation. In this paper, we propose a hybrid feature selection (FS) method to evaluate profile metadata features to find these optimal features, which are evaluated using random forest, naïve Bayes, support vector machines, and neural networks. We found that the cross-validation attribute evaluation performance was the best when compared to other FS methods. Our results show that the random forest classifier with six optimal features achieved the best score of 94.3% for the area under the curve. The results maintained overall 89% accuracy, 83.8% precision, and 83.3% recall for the bot class. We found that using four features: favorites_count, verified, statuses_count, and average_tweets_per_day, achieves good performance metrics for bot detection (84.1% precision, 81.2% recall)

    Predicting the Outcomes of Important Events based on Social Media and Social Network Analysis

    Get PDF
    Twitter is a famous social network website that lets users post their opinions about current affairs, share their social events, and interact with others. It has now become one of the largest sources of news, with over 200 million active users monthly. It is possible to predict the outcomes of events based on social networks using machine learning and big data analytics. Massive data available from social networks can be utilized to improve prediction efficacy and accuracy. It is a challenging problem to achieve high accuracy in predicting the outcomes of political events using Twitter data. The focus of this thesis is to investigate novel approaches to predicting the outcomes of political events from social media and social networks. The first proposed method is to predict election results based on Twitter data analysis. The method extracts and analyses sentimental information from microblogs to predict the popularity of candidates. Experimental results have shown its advantages over the existing method for predicting outcomes of politic events. The second proposed method is to predict election results based on Twitter data analysis that analyses sentimental information using term weighting and selection to predict the popularity of candidates. Scaling factors are used for different types of terms, which help to select informative terms more effectively and achieve better prediction results than the previous method. The third method proposed in this thesis represents the social network by using network connectivity constructed based on retweet data and social media contents as well, leading to a new approach to predicting the outcome of political events. Two approaches, whole-network and sub-network, have been developed and compared. Experimental results show that the sub-network approach, which constructs sub-networks based on different topics, outperformed the whole-network approach

    Combining Likes-Retweet Analysis and Naive Bayes Classifier within Twitter for Sentiment Analysis

    Get PDF
    Sentiment analysis is a research study that aims to extract subjectivity of opinions. Due to massive growth number of user generated content in social media, Twitter is one of the most popular microblogging application which user is freely to discuss and share opinions about specific topic or entity. Twitter have several features that potentially can be used to improve sentiment analysis such as like and retweet. Like and retweet are mechanism in Twitter to propagate or share and to show appreciation of other user posting. This paper proposes a combination of textual and non-textual features to improve performance of sentiment prediction. In this research we apply Naïve Bayes for textual classification and Fisher Score to determine non-textual (like and retweet) features. By combining two kinds of features, our experimental find the optimal value of α and β. The evaluation performance using F1-measure gives 0.838 of accuracy with α and β are 0.6 and 0.4 respectively
    • …
    corecore