147 research outputs found

    Seminar Users in the Arabic Twitter Sphere

    Full text link
    We introduce the notion of "seminar users", who are social media users engaged in propaganda in support of a political entity. We develop a framework that can identify such users with 84.4% precision and 76.1% recall. While our dataset is from the Arab region, omitting language-specific features has only a minor impact on classification performance, and thus, our approach could work for detecting seminar users in other parts of the world and in other languages. We further explored a controversial political topic to observe the prevalence and potential potency of such users. In our case study, we found that 25% of the users engaged in the topic are in fact seminar users and their tweets make nearly a third of the on-topic tweets. Moreover, they are often successful in affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201

    A comparison of classification models to detect cyberbullying in the peruvian spanish language on Twitter

    Get PDF
    Cyberbullying is a social problem in which bullies’ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier

    A comparison of classification models to detect cyberbullying in the Peruvian Spanish language on twitter

    Get PDF
    Cyberbullying is a social problem in which bullies’ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier

    Detecting Abusive Language on Online Platforms: A Critical Analysis

    Full text link
    Abusive language on online platforms is a major societal problem, often leading to important societal problems such as the marginalisation of underrepresented minorities. There are many different forms of abusive language such as hate speech, profanity, and cyber-bullying, and online platforms seek to moderate it in order to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users. Within the field of Natural Language Processing, researchers have developed different methods for automatically detecting abusive language, often focusing on specific subproblems or on narrow communities, as what is considered abusive language very much differs by context. We argue that there is currently a dichotomy between what types of abusive language online platforms seek to curb, and what research efforts there are to automatically detect abusive language. We thus survey existing methods as well as content moderation policies by online platforms in this light, and we suggest directions for future work

    Cyberbullying detection: Current trends and future directions

    Get PDF
    As we see the rapid growth of Web 2.0; online social networks-OSNs and online communications which provides platforms to connect each other all over the world and express the opinion and interests. Online users are generating big amount of data every day. As result, OSNs are providing opportunities for cybercrime and cyberbullying activities. Cyberbullying is online harassing, humiliating or insulting an online user through sending text messages of threatening or harassing using online tool of communication. This research paper provides the comprehensive overview of cyberbullying that occurs usually on OSNs websites and provides current approaches to tackle cyberbullying on OSNs. It also highlights the issues and challenges in cyberbullying detection system and outline the future direction for research in this area. The topic discussed in this paper start with introduction of OSNs, cyberbullying, types of cyberbullying, and data accessibility is reviewed. Lastly, issues and challenges concerning cyberbullying detection are highlighted

    Sentiment analysis of text with lossless mining

    Get PDF
    Social networks are becoming more and more real with their power to influence public opinions, election outcomes, or the creation of an artificial surge in demand or supply. The continuous stream of information is valuable, but it comes with a big data problem. The question is how to mine social text at a large scale and execute machine learning algorithms to create predictive models or historical views of previous trends. This paper introduces a cyber dictionary for every user, which contains only words used in tweets - as a case study. Then, it mines all the known and unknown words by their frequency, which provides the analytic capability to run a multi-level classifier

    Study of the Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms

    Get PDF
    Mining opinion on social media microblogs presents opportunities to extract meaningful insight from the public from trending issues like the “yahoo-yahoo” which in Nigeria, is synonymous to cybercrime. In this study, content analysis of selected historical tweets from “yahoo-yahoo” hash-tag was conducted for sentiment and topic modelling. A corpus of 5500 tweets was obtained and pre-processed using a pre-trained tweet tokenizer while Valence Aware Dictionary for Sentiment Reasoning (VADER), Liu Hu method, Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and Multidimensional Scaling (MDS) graphs were used for sentiment analysis, topic modelling and topic visualization. Results showed the corpus had 173 unique tweet clusters, 5327 duplicates tweets and a frequency of 9555 for “yahoo”. Further validation using the mean sentiment scores of ten volunteers returned R and R2 of 0.8038 and 0.6402; 0.5994 and 0.3463; 0.5999 and 0.3586 for Human and VADER; Human and Liu Hu; Liu Hu and VADER sentiment scores, respectively. While VADER outperforms Liu Hu in sentiment analysis, LDA and LSI returned similar results in the topic modelling. The study confirms VADER’s performance on unstructured social media data containing non-English slangs, conjunctions, emoticons, etc. and proved that emojis are more representative of sentiments in tweets than the texts.publishedVersio

    Moving to Digital-Healthy Society: Empathy, Sympathy, and Wellbeing in Social Media

    Get PDF
    Background: This research aims to explore the impact of individuals’ demographics and their social media use on empathy, sympathy, and wellbeing in Saudi Arabia. This paper can fill an untapped gap in a developing country (i.e., the Arab context) by shedding light on sympathetic and empathetic behavior and its effect on wellbeing in social media. Method: We manage to obtain a sample of 431 responses across all Saudi regions. Data were analyzed to evaluate reliability and validity of the study’s constructs while the hypotheses were tested using a structural equation modeling (SEM) technique. Results: SEM regression results suggest that there is a significant relationship between both age and income and social media use. In addition, social media use has an indirect relationship to individuals’ wellbeing. This indirect relationship is better manifested through sympathy rather than empathy. Conclusion: Theoretically, this study furthers our understanding of the role of empathy and sympathy on wellbeing in social media among Saudis, whereas practically provides insights to industry experts about what matters to social media users to increase their wellbeing

    An NLP-Powered Human Rights Monitoring Platform

    Get PDF
    Effective information management has long been a problem in organisations that are not of a scale that they can afford their own department dedicated to this task. Growing information overload has made this problem even more pronounced. On the other hand we have recently witnessed the emergence of intelligent tools, packages and resources that made it possible to rapidly transfer knowledge from the academic community to industry, government and other potential beneficiaries. Here we demonstrate how adopting state-of-the-art natural language processing (NLP) and crowdsourcing methods has resulted in measurable benefits for a human rights organisation by transforming their information and knowledge management using a novel approach that supports human rights monitoring in conflict zones. More specifically, we report on mining and classifying Arabic Twitter in order to identify potential human rights abuse incidents in a continuous stream of social media data within a specified geographical region. Results show deep learning approaches such as LSTM allow us to push the precision close to 85% for this task with an F1-score of 75%. Apart from the scientific insights we also demonstrate the viability of the framework which has been deployed as the Ceasefire Iraq portal for more than three years which has already collected thousands of witness reports from within Iraq. This work is a case study of how progress in artificial intelligence has disrupted even the operation of relatively small-scale organisations
    • 

    corecore