2,149 research outputs found
Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users
If people with high risk of suicide can be identified through social media
like microblog, it is possible to implement an active intervention system to
save their lives. Based on this motivation, the current study administered the
Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a
leading microblog service provider in China. Two NLP (Natural Language
Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count
(LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract
linguistic features from the Sina Weibo data. We trained predicting models by
machine learning algorithm based on these two types of features, to estimate
suicide probability based on linguistic features. The experiment results
indicate that LDA can find topics that relate to suicide probability, and
improve the performance of prediction. Our study adds value in prediction of
suicidal probability of social network users with their behaviors
Understanding and Measuring Psychological Stress using Social Media
A body of literature has demonstrated that users' mental health conditions,
such as depression and anxiety, can be predicted from their social media
language. There is still a gap in the scientific understanding of how
psychological stress is expressed on social media. Stress is one of the primary
underlying causes and correlates of chronic physical illnesses and mental
health conditions. In this paper, we explore the language of psychological
stress with a dataset of 601 social media users, who answered the Perceived
Stress Scale questionnaire and also consented to share their Facebook and
Twitter data. Firstly, we find that stressed users post about exhaustion,
losing control, increased self-focus and physical pain as compared to posts
about breakfast, family-time, and travel by users who are not stressed.
Secondly, we find that Facebook language is more predictive of stress than
Twitter language. Thirdly, we demonstrate how the language based models thus
developed can be adapted and be scaled to measure county-level trends. Since
county-level language is easily available on Twitter using the Streaming API,
we explore multiple domain adaptation algorithms to adapt user-level Facebook
models to Twitter language. We find that domain-adapted and scaled social
media-based measurements of stress outperform sociodemographic variables (age,
gender, race, education, and income), against ground-truth survey-based stress
measurements, both at the user- and the county-level in the U.S. Twitter
language that scores higher in stress is also predictive of poorer health, less
access to facilities and lower socioeconomic status in counties. We conclude
with a discussion of the implications of using social media as a new tool for
monitoring stress levels of both individuals and counties.Comment: Accepted for publication in the proceedings of ICWSM 201
Listening between the Lines: Learning Personal Attributes from Conversations
Open-domain dialogue agents must be able to converse about many topics while
incorporating knowledge about the user into the conversation. In this work we
address the acquisition of such knowledge, for personalization in downstream
Web applications, by extracting personal attributes from conversations. This
problem is more challenging than the established task of information extraction
from scientific publications or Wikipedia articles, because dialogues often
give merely implicit cues about the speaker. We propose methods for inferring
personal attributes, such as profession, age or family status, from
conversations using deep learning. Specifically, we propose several Hidden
Attribute Models, which are neural networks leveraging attention mechanisms and
embeddings. Our methods are trained on a per-predicate basis to output rankings
of object values for a given subject-predicate combination (e.g., ranking the
doctor and nurse professions high when speakers talk about patients, emergency
rooms, etc). Experiments with various conversational texts including Reddit
discussions, movie scripts and a collection of crowdsourced personal dialogues
demonstrate the viability of our methods and their superior performance
compared to state-of-the-art baselines.Comment: published in WWW'1
- …