358 research outputs found
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying
Cyberbullying (harassment on social networks) is widely recognized as a serious social problem, especially for adolescents. It is as much a threat to the viability of online social networks for youth today as spam once was to email in the early days of the Internet. Current work to tackle this problem has involved social and psychological studies on its prevalence as well as its negative effects on adolescents. While true solutions rest on teaching youth to have healthy personal relationships, few have considered innovative design of social network software as a tool for mitigating this problem. Mitigating cyberbullying involves two key components: robust techniques for effective detection and reflective user interfaces that encourage users to reflect upon their behavior and their choices.
Spam filters have been successful by applying statistical approaches like Bayesian networks and hidden Markov models. They can, like Googleâs GMail, aggregate human spam judgments because spam is sent nearly identically to many people. Bullying is more personalized, varied, and contextual. In this work, we present an approach for bullying detection based on state-of-the-art natural language processing and a common sense knowledge base, which permits recognition over a broad spectrum of topics in everyday life. We analyze a more narrow range of particular subject matter associated with bullying (e.g. appearance, intelligence, racial and ethnic slurs, social acceptance, and rejection), and construct BullySpace, a common sense knowledge base that encodes particular knowledge about bullying situations. We then perform joint reasoning with common sense knowledge about a wide range of everyday life topics. We analyze messages using our novel AnalogySpace common sense reasoning technique. We also take into account social network analysis and other factors. We evaluate the model on real-world instances that have been reported by users on Formspring, a social networking website that is popular with teenagers.
On the intervention side, we explore a set of reflective user-interaction paradigms with the goal of promoting empathy among social network participants. We propose an âair traffic controlâ-like dashboard, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints. For potential victims, we provide educational material that informs them about how to cope with the situation, and connects them with emotional support from others. A user evaluation shows that in-context, targeted, and dynamic help during cyberbullying situations fosters end-user reflection that promotes better coping strategies
Disturbed YouTube for kids: characterizing and detecting inappropriate videos targeting young children
A large number of the most-subscribed YouTube channels target
children of very young age. Hundreds of toddler-oriented
channels on YouTube feature inoffensive, well produced, and
educational videos. Unfortunately, inappropriate content that
targets this demographic is also common. YouTubeâs algorithmic
recommendation system regrettably suggests inappropriate
content because some of it mimics or is derived from otherwise
appropriate content. Considering the risk for early childhood
development, and an increasing trend in toddlerâs consumption
of YouTube media, this is a worrisome problem.
In this work, we build a classifier able to discern inappropriate
content that targets toddlers on YouTube with 84:3% accuracy,
and leverage it to perform a first-of-its-kind, large-scale,
quantitative characterization that reveals some of the risks of
YouTube media consumption by young children. Our analysis
reveals that YouTube is still plagued by such disturbing videos
and its currently deployed counter-measures are ineffective in
terms of detecting them in a timely manner. Alarmingly, using
our classifier we show that young children are not only able,
but likely to encounter disturbing videos when they randomly
browse the platform starting from benign videos.Accepted manuscrip
Disturbed YouTube for kids: characterizing and detecting inappropriate videos targeting young children
A large number of the most-subscribed YouTube channels target
children of very young age. Hundreds of toddler-oriented
channels on YouTube feature inoffensive, well produced, and
educational videos. Unfortunately, inappropriate content that
targets this demographic is also common. YouTubeâs algorithmic
recommendation system regrettably suggests inappropriate
content because some of it mimics or is derived from otherwise
appropriate content. Considering the risk for early childhood
development, and an increasing trend in toddlerâs consumption
of YouTube media, this is a worrisome problem.
In this work, we build a classifier able to discern inappropriate
content that targets toddlers on YouTube with 84:3% accuracy,
and leverage it to perform a first-of-its-kind, large-scale,
quantitative characterization that reveals some of the risks of
YouTube media consumption by young children. Our analysis
reveals that YouTube is still plagued by such disturbing videos
and its currently deployed counter-measures are ineffective in
terms of detecting them in a timely manner. Alarmingly, using
our classifier we show that young children are not only able,
but likely to encounter disturbing videos when they randomly
browse the platform starting from benign videos.Accepted manuscrip
XRay: Enhancing the Web's Transparency with Differential Correlation
Today's Web services - such as Google, Amazon, and Facebook - leverage user
data for varied purposes, including personalizing recommendations, targeting
advertisements, and adjusting prices. At present, users have little insight
into how their data is being used. Hence, they cannot make informed choices
about the services they choose. To increase transparency, we developed XRay,
the first fine-grained, robust, and scalable personal data tracking system for
the Web. XRay predicts which data in an arbitrary Web account (such as emails,
searches, or viewed products) is being used to target which outputs (such as
ads, recommended products, or prices). XRay's core functions are service
agnostic and easy to instantiate for new services, and they can track data
within and across services. To make predictions independent of the audited
service, XRay relies on the following insight: by comparing outputs from
different accounts with similar, but not identical, subsets of data, one can
pinpoint targeting through correlation. We show both theoretically, and through
experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision
and recall by correlating data from a surprisingly small number of extra
accounts.Comment: Extended version of a paper presented at the 23rd USENIX Security
Symposium (USENIX Security 14
Combating Threats to the Quality of Information in Social Systems
Many large-scale social systems such as Web-based social networks, online social media sites and Web-scale crowdsourcing systems have been growing rapidly, enabling millions of human participants to generate, share and consume content on a massive scale. This reliance on users can lead to many positive effects, including large-scale growth in the size and content in the community, bottom-up discovery of âcitizen-expertsâ, serendipitous discovery of new resources beyond the scope of the system designers, and new social-based information search and retrieval algorithms. But the relative openness and reliance on users coupled with the widespread interest and growth of these social systems carries risks and raises growing concerns over the quality of information in these systems.
In this dissertation research, we focus on countering threats to the quality of information in self-managing social systems. Concretely, we identify three classes of threats to these systems: (i) content pollution by social spammers, (ii) coordinated campaigns for strategic manipulation, and (iii) threats to collective attention. To combat these threats, we propose three inter-related methods for detecting evidence of these threats, mitigating their impact, and improving the quality of information in social systems. We augment this three-fold defense with an exploration of their origins in âcrowdturfingâ â a sinister counterpart to the enormous positive opportunities of crowdsourcing. In particular, this dissertation research makes four unique contributions:
âą The first contribution of this dissertation research is a framework for detecting and filtering social spammers and content polluters in social systems. To detect and filter individual social spammers and content polluters, we propose and evaluate a novel social honeypot-based approach.
âą Second, we present a set of methods and algorithms for detecting coordinated campaigns in large-scale social systems. We propose and evaluate a content- driven framework for effectively linking free text posts with common âtalking pointsâ and extracting campaigns from large-scale social systems.
âą Third, we present a dual study of the robustness of social systems to collective attention threats through both a data-driven modeling approach and deploy- ment over a real system trace. We evaluate the effectiveness of countermeasures deployed based on the first moments of a bursting phenomenon in a real system.
âą Finally, we study the underlying ecosystem of crowdturfing for engaging in each of the three threat types. We present a framework for âpulling back the curtainâ on crowdturfers to reveal their underlying ecosystem on both crowdsourcing sites and social media
Detecting cyberbullying and cyberaggression in social media
Cyberbullying and cyberaggression are increasingly worrisome phenomena affecting people across all demographics. More than half of young social media users worldwide have been exposed to such prolonged and/or coordinated digital harassment. Victims can experience a wide range of emotions, with negative consequences such as embarrassment, depression, isolation from other community members, which embed the risk to lead to even more critical consequences, such as suicide attempts.
In this work, we take the first concrete steps to understand the characteristics of abusive behavior in Twitter, one of todayâs largest social media platforms. We analyze 1.2 million users and 2.1 million tweets, comparing users participating in discussions around seemingly normal topics like the NBA, to those more likely to be hate-related, such as the Gamergate controversy, or the gender pay inequality at the BBC station. We also explore specific manifestations of abusive behavior, i.e., cyberbullying and cyberaggression, in one of the hate-related communities (Gamergate). We present a robust methodology to distinguish bullies and aggressors from normal Twitter users by considering text, user, and network-based attributes. Using various state-of-the-art machine-learning algorithms, we classify these accounts with over 90% accuracy and AUC. Finally, we discuss the current status of Twitter user accounts marked as abusive by our methodology and study the performance of potential mechanisms that can be used by Twitter to suspend users in the future
A systematic survey of online data mining technology intended for law enforcement
As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies
- âŠ