519 research outputs found
Sensing Subjective Well-being from Social Media
Subjective Well-being(SWB), which refers to how people experience the quality
of their lives, is of great use to public policy-makers as well as economic,
sociological research, etc. Traditionally, the measurement of SWB relies on
time-consuming and costly self-report questionnaires. Nowadays, people are
motivated to share their experiences and feelings on social media, so we
propose to sense SWB from the vast user generated data on social media. By
utilizing 1785 users' social media data with SWB labels, we train machine
learning models that are able to "sense" individual SWB from users' social
media. Our model, which attains the state-by-art prediction accuracy, can then
be used to identify SWB of large population of social media users in time with
very low cost.Comment: 12 pages, 1 figures, 2 tables, 10th International Conference, AMT
2014, Warsaw, Poland, August 11-14, 2014. Proceeding
Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users
If people with high risk of suicide can be identified through social media
like microblog, it is possible to implement an active intervention system to
save their lives. Based on this motivation, the current study administered the
Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a
leading microblog service provider in China. Two NLP (Natural Language
Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count
(LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract
linguistic features from the Sina Weibo data. We trained predicting models by
machine learning algorithm based on these two types of features, to estimate
suicide probability based on linguistic features. The experiment results
indicate that LDA can find topics that relate to suicide probability, and
improve the performance of prediction. Our study adds value in prediction of
suicidal probability of social network users with their behaviors
"When and Where?": Behavior Dominant Location Forecasting with Micro-blog Streams
The proliferation of smartphones and wearable devices has increased the
availability of large amounts of geospatial streams to provide significant
automated discovery of knowledge in pervasive environments, but most prominent
information related to altering interests have not yet adequately capitalized.
In this paper, we provide a novel algorithm to exploit the dynamic fluctuations
in user's point-of-interest while forecasting the future place of visit with
fine granularity. Our proposed algorithm is based on the dynamic formation of
collective personality communities using different languages, opinions,
geographical and temporal distributions for finding out optimized equivalent
content. We performed extensive empirical experiments involving, real-time
streams derived from 0.6 million stream tuples of micro-blog comprising 1945
social person fusion with graph algorithm and feed-forward neural network model
as a predictive classification model. Lastly, The framework achieves 62.10%
mean average precision on 1,20,000 embeddings on unlabeled users and
surprisingly 85.92% increment on the state-of-the-art approach.Comment: Accepted as a full paper in the 2nd International Workshop on Social
Computing co-located with ICDM, 2018 Singapor
Automatic Conditional Generation of Personalized Social Media Short Texts
Automatic text generation has received much attention owing to rapid
development of deep neural networks. In general, text generation systems based
on statistical language model will not consider anthropomorphic
characteristics, which results in machine-like generated texts. To fill the
gap, we propose a conditional language generation model with Big Five
Personality (BFP) feature vectors as input context, which writes human-like
short texts. The short text generator consists of a layer of long short memory
network (LSTM), where a BFP feature vector is concatenated as one part of input
for each cell. To enable supervised training generation model, a text
classification model based convolution neural network (CNN) has been used to
prepare BFP-tagged Chinese micro-blog corpora. Validated by a BFP linguistic
computational model, our generated Chinese short texts exhibit discriminative
personality styles, which are also syntactically correct and semantically
smooth with appropriate emoticons. With combination of natural language
generation with psychological linguistics, our proposed BFP-dependent text
generation model can be widely used for individualization in machine
translation, image caption, dialogue generation and so on.Comment: published in PRICAI 201
The Value of Alternative Data in Credit Risk Prediction: Evidence from a Large Field Experiment
Recently, the high penetration of mobile devices and internet access offers a new source of fine-grained user behavior data (aka “alternative data”) to improve the financial credit risk assessment. This paper conducts a comprehensive evaluation of the value of alternative data on microloan platforms with a large field experiment. Our machine-learning-based empirical analyses demonstrate that alternative data can significantly improve the prediction accuracy of borrowers’ default behavior and increase platform profits. Cellphone usage and mobility trace information perform the best among the multiple sources of alternative data. Moreover, we find that our proposed framework helps financial institutions extend their service to more lower-income and less-educated loan applicants from less-developed geographical areas – those historically disadvantaged population who have been largely neglected in the past. Our study demonstrates the tremendous potential of leveraging alternative data to alleviate such inequality in the financial service markets, while in the meantime achieving higher platform revenues
- …