147 research outputs found
Seminar Users in the Arabic Twitter Sphere
We introduce the notion of "seminar users", who are social media users
engaged in propaganda in support of a political entity. We develop a framework
that can identify such users with 84.4% precision and 76.1% recall. While our
dataset is from the Arab region, omitting language-specific features has only a
minor impact on classification performance, and thus, our approach could work
for detecting seminar users in other parts of the world and in other languages.
We further explored a controversial political topic to observe the prevalence
and potential potency of such users. In our case study, we found that 25% of
the users engaged in the topic are in fact seminar users and their tweets make
nearly a third of the on-topic tweets. Moreover, they are often successful in
affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
A comparison of classification models to detect cyberbullying in the peruvian spanish language on Twitter
Cyberbullying is a social problem in which bulliesâ
actions are more harmful than in traditional forms of bullying as
they have the power to repeatedly humiliate the victim in front of
an entire community through social media. Nowadays, multiple
works aim at detecting acts of cyberbullying via the analysis of
texts in social media publications written in one or more
languages; however, few investigations target the cyberbullying
detection in the Spanish language. In this work, we aim to
compare four traditional supervised machine learning methods
performances in detecting cyberbullying via the identification of
four cyberbullying-related categories on Twitter posts written in
the Peruvian Spanish language. Specifically, we trained and
tested the Naive Bayes, Multinomial Logistic Regression, Support
Vector Machines, and Random Forest classifiers upon a
manually annotated dataset with the help of human participants.
The results indicate that the best performing classifier for the
cyberbullying detection task was the Support Vector Machine
classifier
A comparison of classification models to detect cyberbullying in the Peruvian Spanish language on twitter
Cyberbullying is a social problem in which bulliesâ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier
Detecting Abusive Language on Online Platforms: A Critical Analysis
Abusive language on online platforms is a major societal problem, often
leading to important societal problems such as the marginalisation of
underrepresented minorities. There are many different forms of abusive language
such as hate speech, profanity, and cyber-bullying, and online platforms seek
to moderate it in order to limit societal harm, to comply with legislation, and
to create a more inclusive environment for their users. Within the field of
Natural Language Processing, researchers have developed different methods for
automatically detecting abusive language, often focusing on specific
subproblems or on narrow communities, as what is considered abusive language
very much differs by context. We argue that there is currently a dichotomy
between what types of abusive language online platforms seek to curb, and what
research efforts there are to automatically detect abusive language. We thus
survey existing methods as well as content moderation policies by online
platforms in this light, and we suggest directions for future work
Cyberbullying detection: Current trends and future directions
As we see the rapid growth of Web 2.0; online social networks-OSNs and online communications which provides platforms to connect each other all over the world and express the opinion and interests. Online users are generating big amount of data every day. As result, OSNs are providing opportunities for cybercrime and cyberbullying activities. Cyberbullying is online harassing, humiliating or insulting an online user through sending text messages of threatening or harassing using online tool of communication. This research paper provides the comprehensive overview of cyberbullying that occurs usually on OSNs websites and provides current approaches to tackle cyberbullying on OSNs. It also highlights the issues and challenges in cyberbullying detection system and outline the future direction for research in this area. The topic discussed in this paper start with introduction of OSNs, cyberbullying, types of cyberbullying, and data accessibility is reviewed. Lastly, issues and challenges concerning cyberbullying detection are highlighted
Sentiment analysis of text with lossless mining
Social networks are becoming more and more real with their power to influence public opinions, election outcomes, or the creation of an artificial surge in demand or supply. The continuous stream of information is valuable, but it comes with a big data problem. The question is how to mine social text at a large scale and execute machine learning algorithms to create predictive models or historical views of previous trends. This paper introduces a cyber dictionary for every user, which contains only words used in tweets - as a case study. Then, it mines all the known and unknown words by their frequency, which provides the analytic capability to run a multi-level classifier
Study of the Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms
Mining opinion on social media microblogs presents opportunities to extract meaningful insight from the public from trending issues like the âyahoo-yahooâ which in Nigeria, is synonymous to cybercrime. In this study, content analysis of selected historical tweets from âyahoo-yahooâ hash-tag was conducted for sentiment and topic modelling. A corpus of 5500 tweets was obtained and pre-processed using a pre-trained tweet tokenizer while Valence Aware Dictionary for Sentiment Reasoning (VADER), Liu Hu method, Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and Multidimensional Scaling (MDS) graphs were used for sentiment analysis, topic modelling and topic visualization. Results showed the corpus had 173 unique tweet clusters, 5327 duplicates tweets and a frequency of 9555 for âyahooâ. Further validation using the mean sentiment scores of ten volunteers returned R and R2 of 0.8038 and 0.6402; 0.5994 and 0.3463; 0.5999 and 0.3586 for Human and VADER; Human and Liu Hu; Liu Hu and VADER sentiment scores, respectively. While VADER outperforms Liu Hu in sentiment analysis, LDA and LSI returned similar results in the topic modelling. The study confirms VADERâs performance on unstructured social media data containing non-English slangs, conjunctions, emoticons, etc. and proved that emojis are more representative of sentiments in tweets than the texts.publishedVersio
Moving to Digital-Healthy Society: Empathy, Sympathy, and Wellbeing in Social Media
Background: This research aims to explore the impact of individualsâ demographics and their social media use on empathy, sympathy, and wellbeing in Saudi Arabia. This paper can fill an untapped gap in a developing country (i.e., the Arab context) by shedding light on sympathetic and empathetic behavior and its effect on wellbeing in social media.
Method: We manage to obtain a sample of 431 responses across all Saudi regions. Data were analyzed to evaluate reliability and validity of the studyâs constructs while the hypotheses were tested using a structural equation modeling (SEM) technique.
Results: SEM regression results suggest that there is a significant relationship between both age and income and social media use. In addition, social media use has an indirect relationship to individualsâ wellbeing. This indirect relationship is better manifested through sympathy rather than empathy.
Conclusion: Theoretically, this study furthers our understanding of the role of empathy and sympathy on wellbeing in social media among Saudis, whereas practically provides insights to industry experts about what matters to social media users to increase their wellbeing
An NLP-Powered Human Rights Monitoring Platform
Effective information management has long been a problem in organisations that are not of a scale that they can afford their own department dedicated to this task. Growing information overload has made this problem even more pronounced. On the other hand we have recently witnessed the emergence of intelligent tools, packages and resources that made it possible to rapidly transfer knowledge from the academic community to industry, government and other potential beneficiaries. Here we demonstrate how adopting state-of-the-art natural language processing (NLP) and crowdsourcing methods has resulted in measurable benefits for a human rights organisation by transforming their information and knowledge management using a novel approach that supports human rights monitoring in conflict zones. More specifically, we report on mining and classifying Arabic Twitter in order to identify potential human rights abuse incidents in a continuous stream of social media data within a specified geographical region. Results show deep learning approaches such as LSTM allow us to push the precision close to 85% for this task with an F1-score of 75%. Apart from the scientific insights we also demonstrate the viability of the framework which has been deployed as the Ceasefire Iraq portal for more than three years which has already collected thousands of witness reports from within Iraq. This work is a case study of how progress in artificial intelligence has disrupted even the operation of relatively small-scale organisations
- âŠ