12,172 research outputs found
Clustering Memes in Social Media
The increasing pervasiveness of social media creates new opportunities to
study human social behavior, while challenging our capability to analyze their
massive data streams. One of the emerging tasks is to distinguish between
different kinds of activities, for example engineered misinformation campaigns
versus spontaneous communication. Such detection problems require a formal
definition of meme, or unit of information that can spread from person to
person through the social network. Once a meme is identified, supervised
learning methods can be applied to classify different types of communication.
The appropriate granularity of a meme, however, is hardly captured from
existing entities such as tags and keywords. Here we present a framework for
the novel task of detecting memes by clustering messages from large streams of
social data. We evaluate various similarity measures that leverage content,
metadata, network features, and their combinations. We also explore the idea of
pre-clustering on the basis of existing entities. A systematic evaluation is
carried out using a manually curated dataset as ground truth. Our analysis
shows that pre-clustering and a combination of heterogeneous features yield the
best trade-off between number of clusters and their quality, demonstrating that
a simple combination based on pairwise maximization of similarity is as
effective as a non-trivial optimization of parameters. Our approach is fully
automatic, unsupervised, and scalable for real-time detection of memes in
streaming data.Comment: Proceedings of the 2013 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining (ASONAM'13), 201
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
With the rise of social media, millions of people are routinely expressing
their moods, feelings, and daily struggles with mental health issues on social
media platforms like Twitter. Unlike traditional observational cohort studies
conducted through questionnaires and self-reported surveys, we explore the
reliable detection of clinical depression from tweets obtained unobtrusively.
Based on the analysis of tweets crawled from users with self-reported
depressive symptoms in their Twitter profiles, we demonstrate the potential for
detecting clinical depression symptoms which emulate the PHQ-9 questionnaire
clinicians use today. Our study uses a semi-supervised statistical model to
evaluate how the duration of these symptoms and their expression on Twitter (in
terms of word usage patterns and topical preferences) align with the medical
findings reported via the PHQ-9. Our proactive and automatic screening tool is
able to identify clinical depressive symptoms with an accuracy of 68% and
precision of 72%.Comment: 8 pages, Advances in Social Networks Analysis and Mining (ASONAM),
2017 IEEE/ACM International Conferenc
A Retrospective Analysis of the Fake News Challenge Stance Detection Task
The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance
classification task as a crucial first step towards detecting fake news. To
date, there is no in-depth analysis paper to critically discuss FNC-1's
experimental setup, reproduce the results, and draw conclusions for
next-generation stance classification methods. In this paper, we provide such
an in-depth analysis for the three top-performing systems. We first find that
FNC-1's proposed evaluation metric favors the majority class, which can be
easily classified, and thus overestimates the true discriminative power of the
methods. Therefore, we propose a new F1-based metric yielding a changed system
ranking. Next, we compare the features and architectures used, which leads to a
novel feature-rich stacked LSTM model that performs on par with the best
systems, but is superior in predicting minority classes. To understand the
methods' ability to generalize, we derive a new dataset and perform both
in-domain and cross-domain experiments. Our qualitative and quantitative study
helps interpreting the original FNC-1 scores and understand which features help
improving performance and why. Our new dataset and all source code used during
the reproduction study are publicly available for future research
- …