398 research outputs found
Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users
If people with high risk of suicide can be identified through social media
like microblog, it is possible to implement an active intervention system to
save their lives. Based on this motivation, the current study administered the
Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a
leading microblog service provider in China. Two NLP (Natural Language
Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count
(LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract
linguistic features from the Sina Weibo data. We trained predicting models by
machine learning algorithm based on these two types of features, to estimate
suicide probability based on linguistic features. The experiment results
indicate that LDA can find topics that relate to suicide probability, and
improve the performance of prediction. Our study adds value in prediction of
suicidal probability of social network users with their behaviors
Mining Unfollow Behavior in Large-Scale Online Social Networks via Spatial-Temporal Interaction
Online Social Networks (OSNs) evolve through two pervasive behaviors: follow
and unfollow, which respectively signify relationship creation and relationship
dissolution. Researches on social network evolution mainly focus on the follow
behavior, while the unfollow behavior has largely been ignored. Mining unfollow
behavior is challenging because user's decision on unfollow is not only
affected by the simple combination of user's attributes like informativeness
and reciprocity, but also affected by the complex interaction among them.
Meanwhile, prior datasets seldom contain sufficient records for inferring such
complex interaction. To address these issues, we first construct a large-scale
real-world Weibo dataset, which records detailed post content and relationship
dynamics of 1.8 million Chinese users. Next, we define user's attributes as two
categories: spatial attributes (e.g., social role of user) and temporal
attributes (e.g., post content of user). Leveraging the constructed dataset, we
systematically study how the interaction effects between user's spatial and
temporal attributes contribute to the unfollow behavior. Afterwards, we propose
a novel unified model with heterogeneous information (UMHI) for unfollow
prediction. Specifically, our UMHI model: 1) captures user's spatial attributes
through social network structure; 2) infers user's temporal attributes through
user-posted content and unfollow history; and 3) models the interaction between
spatial and temporal attributes by the nonlinear MLP layers. Comprehensive
evaluations on the constructed dataset demonstrate that the proposed UMHI model
outperforms baseline methods by 16.44% on average in terms of precision. In
addition, factor analyses verify that both spatial attributes and temporal
attributes are essential for mining unfollow behavior.Comment: 8 pages, 7 figures, Accepted by AAAI 202
Language in Our Time: An Empirical Analysis of Hashtags
Hashtags in online social networks have gained tremendous popularity during
the past five years. The resulting large quantity of data has provided a new
lens into modern society. Previously, researchers mainly rely on data collected
from Twitter to study either a certain type of hashtags or a certain property
of hashtags. In this paper, we perform the first large-scale empirical analysis
of hashtags shared on Instagram, the major platform for hashtag-sharing. We
study hashtags from three different dimensions including the temporal-spatial
dimension, the semantic dimension, and the social dimension. Extensive
experiments performed on three large-scale datasets with more than 7 million
hashtags in total provide a series of interesting observations. First, we show
that the temporal patterns of hashtags can be categorized into four different
clusters, and people tend to share fewer hashtags at certain places and more
hashtags at others. Second, we observe that a non-negligible proportion of
hashtags exhibit large semantic displacement. We demonstrate hashtags that are
more uniformly shared among users, as quantified by the proposed hashtag
entropy, are less prone to semantic displacement. In the end, we propose a
bipartite graph embedding model to summarize users' hashtag profiles, and rely
on these profiles to perform friendship prediction. Evaluation results show
that our approach achieves an effective prediction with AUC (area under the ROC
curve) above 0.8 which demonstrates the strong social signals possessed in
hashtags.Comment: WWW 201
- …