519 research outputs found

    Sensing Subjective Well-being from Social Media

    Full text link
    Subjective Well-being(SWB), which refers to how people experience the quality of their lives, is of great use to public policy-makers as well as economic, sociological research, etc. Traditionally, the measurement of SWB relies on time-consuming and costly self-report questionnaires. Nowadays, people are motivated to share their experiences and feelings on social media, so we propose to sense SWB from the vast user generated data on social media. By utilizing 1785 users' social media data with SWB labels, we train machine learning models that are able to "sense" individual SWB from users' social media. Our model, which attains the state-by-art prediction accuracy, can then be used to identify SWB of large population of social media users in time with very low cost.Comment: 12 pages, 1 figures, 2 tables, 10th International Conference, AMT 2014, Warsaw, Poland, August 11-14, 2014. Proceeding

    Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users

    Full text link
    If people with high risk of suicide can be identified through social media like microblog, it is possible to implement an active intervention system to save their lives. Based on this motivation, the current study administered the Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a leading microblog service provider in China. Two NLP (Natural Language Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count (LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract linguistic features from the Sina Weibo data. We trained predicting models by machine learning algorithm based on these two types of features, to estimate suicide probability based on linguistic features. The experiment results indicate that LDA can find topics that relate to suicide probability, and improve the performance of prediction. Our study adds value in prediction of suicidal probability of social network users with their behaviors

    "When and Where?": Behavior Dominant Location Forecasting with Micro-blog Streams

    Full text link
    The proliferation of smartphones and wearable devices has increased the availability of large amounts of geospatial streams to provide significant automated discovery of knowledge in pervasive environments, but most prominent information related to altering interests have not yet adequately capitalized. In this paper, we provide a novel algorithm to exploit the dynamic fluctuations in user's point-of-interest while forecasting the future place of visit with fine granularity. Our proposed algorithm is based on the dynamic formation of collective personality communities using different languages, opinions, geographical and temporal distributions for finding out optimized equivalent content. We performed extensive empirical experiments involving, real-time streams derived from 0.6 million stream tuples of micro-blog comprising 1945 social person fusion with graph algorithm and feed-forward neural network model as a predictive classification model. Lastly, The framework achieves 62.10% mean average precision on 1,20,000 embeddings on unlabeled users and surprisingly 85.92% increment on the state-of-the-art approach.Comment: Accepted as a full paper in the 2nd International Workshop on Social Computing co-located with ICDM, 2018 Singapor

    Automatic Conditional Generation of Personalized Social Media Short Texts

    Full text link
    Automatic text generation has received much attention owing to rapid development of deep neural networks. In general, text generation systems based on statistical language model will not consider anthropomorphic characteristics, which results in machine-like generated texts. To fill the gap, we propose a conditional language generation model with Big Five Personality (BFP) feature vectors as input context, which writes human-like short texts. The short text generator consists of a layer of long short memory network (LSTM), where a BFP feature vector is concatenated as one part of input for each cell. To enable supervised training generation model, a text classification model based convolution neural network (CNN) has been used to prepare BFP-tagged Chinese micro-blog corpora. Validated by a BFP linguistic computational model, our generated Chinese short texts exhibit discriminative personality styles, which are also syntactically correct and semantically smooth with appropriate emoticons. With combination of natural language generation with psychological linguistics, our proposed BFP-dependent text generation model can be widely used for individualization in machine translation, image caption, dialogue generation and so on.Comment: published in PRICAI 201

    The Value of Alternative Data in Credit Risk Prediction: Evidence from a Large Field Experiment

    Get PDF
    Recently, the high penetration of mobile devices and internet access offers a new source of fine-grained user behavior data (aka “alternative data”) to improve the financial credit risk assessment. This paper conducts a comprehensive evaluation of the value of alternative data on microloan platforms with a large field experiment. Our machine-learning-based empirical analyses demonstrate that alternative data can significantly improve the prediction accuracy of borrowers’ default behavior and increase platform profits. Cellphone usage and mobility trace information perform the best among the multiple sources of alternative data. Moreover, we find that our proposed framework helps financial institutions extend their service to more lower-income and less-educated loan applicants from less-developed geographical areas – those historically disadvantaged population who have been largely neglected in the past. Our study demonstrates the tremendous potential of leveraging alternative data to alleviate such inequality in the financial service markets, while in the meantime achieving higher platform revenues
    corecore