19,733 research outputs found
Large scale crowdsourcing and characterization of Twitter abusive behavior
In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels.By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset
of 80 thousand tweets, which we make publicly available for further scientific exploration.Accepted manuscrip
On Identifying Disaster-Related Tweets: Matching-based or Learning-based?
Social media such as tweets are emerging as platforms contributing to
situational awareness during disasters. Information shared on Twitter by both
affected population (e.g., requesting assistance, warning) and those outside
the impact zone (e.g., providing assistance) would help first responders,
decision makers, and the public to understand the situation first-hand.
Effective use of such information requires timely selection and analysis of
tweets that are relevant to a particular disaster. Even though abundant tweets
are promising as a data source, it is challenging to automatically identify
relevant messages since tweet are short and unstructured, resulting to
unsatisfactory classification performance of conventional learning-based
approaches. Thus, we propose a simple yet effective algorithm to identify
relevant messages based on matching keywords and hashtags, and provide a
comparison between matching-based and learning-based approaches. To evaluate
the two approaches, we put them into a framework specifically proposed for
analyzing disaster-related tweets. Analysis results on eleven datasets with
various disaster types show that our technique provides relevant tweets of
higher quality and more interpretable results of sentiment analysis tasks when
compared to learning approach
The Best Answers? Think Twice: Online Detection of Commercial Campaigns in the CQA Forums
In an emerging trend, more and more Internet users search for information
from Community Question and Answer (CQA) websites, as interactive communication
in such websites provides users with a rare feeling of trust. More often than
not, end users look for instant help when they browse the CQA websites for the
best answers. Hence, it is imperative that they should be warned of any
potential commercial campaigns hidden behind the answers. However, existing
research focuses more on the quality of answers and does not meet the above
need. In this paper, we develop a system that automatically analyzes the hidden
patterns of commercial spam and raises alarms instantaneously to end users
whenever a potential commercial campaign is detected. Our detection method
integrates semantic analysis and posters' track records and utilizes the
special features of CQA websites largely different from those in other types of
forums such as microblogs or news reports. Our system is adaptive and
accommodates new evidence uncovered by the detection algorithms over time.
Validated with real-world trace data from a popular Chinese CQA website over a
period of three months, our system shows great potential towards adaptive
online detection of CQA spams.Comment: 9 pages, 10 figure
Technical report and user guide: the 2010 EU kids online survey
This technical report describes the design and implementation of the EU Kids Online survey of 9-16 year old internet using children and their parents in 25 countries European countries
- …