Search CORE

19,733 research outputs found

Large scale crowdsourcing and characterization of Twitter abusive behavior

Author: Blackburn Jeremy
Chatzakou Despoina
Djouvas Constantinos
Founta Antigoni-Maria
Kourtellis Nicolas
Leontiadis Ilias
Sirivianos Michael
Stringhini Gianluca
Vakali Athena
Publication venue: AAAI Press
Publication date: 01/01/2018
Field of study

In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels.By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset of 80 thousand tweets, which we make publicly available for further scientific exploration.Accepted manuscrip

Boston University Institutional Repository (OpenBU)

On Identifying Disaster-Related Tweets: Matching-based or Learning-based?

Author: Agrawal Sumeet
Kim Seon Ho
Shahabi Cyrus
To Hien
Publication venue
Publication date: 04/05/2017
Field of study

Social media such as tweets are emerging as platforms contributing to situational awareness during disasters. Information shared on Twitter by both affected population (e.g., requesting assistance, warning) and those outside the impact zone (e.g., providing assistance) would help first responders, decision makers, and the public to understand the situation first-hand. Effective use of such information requires timely selection and analysis of tweets that are relevant to a particular disaster. Even though abundant tweets are promising as a data source, it is challenging to automatically identify relevant messages since tweet are short and unstructured, resulting to unsatisfactory classification performance of conventional learning-based approaches. Thus, we propose a simple yet effective algorithm to identify relevant messages based on matching keywords and hashtags, and provide a comparison between matching-based and learning-based approaches. To evaluate the two approaches, we put them into a framework specifically proposed for analyzing disaster-related tweets. Analysis results on eleven datasets with various disaster types show that our technique provides relevant tweets of higher quality and more interpretable results of sentiment analysis tasks when compared to learning approach

arXiv.org e-Print Archive

Crossref

The Best Answers? Think Twice: Online Detection of Commercial Campaigns in the CQA Forums

Author: Chen Cheng
R Kesav Bharadwaj
Srinivasan Venkatesh
Wu Kui
Publication venue
Publication date: 01/01/2012
Field of study

In an emerging trend, more and more Internet users search for information from Community Question and Answer (CQA) websites, as interactive communication in such websites provides users with a rare feeling of trust. More often than not, end users look for instant help when they browse the CQA websites for the best answers. Hence, it is imperative that they should be warned of any potential commercial campaigns hidden behind the answers. However, existing research focuses more on the quality of answers and does not meet the above need. In this paper, we develop a system that automatically analyzes the hidden patterns of commercial spam and raises alarms instantaneously to end users whenever a potential commercial campaign is detected. Our detection method integrates semantic analysis and posters' track records and utilizes the special features of CQA websites largely different from those in other types of forums such as microblogs or news reports. Our system is adaptive and accommodates new evidence uncovered by the detection algorithms over time. Validated with real-world trace data from a popular Chinese CQA website over a period of three months, our system shows great potential towards adaptive online detection of CQA spams.Comment: 9 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

Technical report and user guide: the 2010 EU kids online survey

Author
Publication venue: London School of Economics and Political Science
Publication date: 01/01/2011
Field of study

This technical report describes the design and implementation of the EU Kids Online survey of 9-16 year old internet using children and their parents in 25 countries European countries

LSE Research Online