329 research outputs found

    Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media

    Get PDF
    Real-time social systems like Facebook, Twitter, and Snapchat have been growing rapidly, producing exabytes of data in different views or aspects. Coupled with more and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable information regarding “who”, “where”, “when” and “what”, these real-time human sensor data promise new research opportunities to uncover models of user behavior, mobility, and information sharing. These real-time dynamics in social systems usually come in multiple aspects, which are able to help better understand the social interactions of the underlying network. However, these multi-aspect datasets are often raw and incomplete owing to various unpredictable or unavoidable reasons; for instance, API limitations and data sampling policies can lead to an incomplete (and often biased) perspective on these multi-aspect datasets. This missing data could raise serious concerns such as biased estimations on structural properties of the network and properties of information cascades in social networks. In order to recover missing values or information in social systems, we identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption of rich side information that is able to describe the similarities of entities, generation of robust models rather than limiting them on specific applications, and scalability of models to handle real large-scale datasets (billions of observed entries). With these challenges in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks, algorithms and methods for recovering missing information on social media. In particular, this dissertation research makes four unique contributions: _ The first research contribution of this dissertation research is to propose a scalable framework based on low-rank tensor learning in the presence of incomplete information. Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based factorization approach based on the alternative direction method of multipliers (ADMM) with the integration of the latent relationships derived from contextual information among locations, memes, and times. _ The second research contribution of this dissertation research is to evaluate the generalization of the proposed tensor learning framework and extend it to the recommendation problem. In particular, we develop a novel tensor-based approach to solve the personalized expert recommendation by integrating both the latent relationships between homogeneous entities (e.g., users and users, experts and experts) and the relationships between heterogeneous entities (e.g., users and experts, topics and experts) from the geo-spatial, topical, and social contexts. _ The third research contribution of this dissertation research is to extend the proposed tensor learning framework to the user topical profiling problem. Specifically, we propose a tensor-based contextual regularization model embedded into a matrix factorization framework, which leverages the social, textual, and behavioral contexts across users, in order to overcome identified challenges. _ The fourth research contribution of this dissertation research is to scale up the proposed tensor learning framework to be capable of handling real large-scale datasets that are too big to fit in the main memory of a single machine. Particularly, we propose a novel distributed tensor completion algorithm with the trace-based regularization of the auxiliary information based on ADMM under the proposed tensor learning framework, which is designed to scale up to real large-scale tensors (e.g., billions of entries) by efficiently computing auxiliary variables, minimizing intermediate data, and reducing the workload of updating new tensors

    SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

    Get PDF
    In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learning methods. Despite the vast interest on the theme and wide popularity of some methods, it is unclear which one is better for identifying the polarity (i.e., positive or negative) of a message. Accordingly, there is a strong need to conduct a thorough apple-to-apple comparison of sentiment analysis methods, \textit{as they are used in practice}, across multiple datasets originated from different data sources. Such a comparison is key for understanding the potential limitations, advantages, and disadvantages of popular methods. This article aims at filling this gap by presenting a benchmark comparison of twenty-four popular sentiment analysis methods (which we call the state-of-the-practice methods). Our evaluation is based on a benchmark of eighteen labeled datasets, covering messages posted on social networks, movie and product reviews, as well as opinions and comments in news articles. Our results highlight the extent to which the prediction performance of these methods varies considerably across datasets. Aiming at boosting the development of this research area, we open the methods' codes and datasets used in this article, deploying them in a benchmark system, which provides an open API for accessing and comparing sentence-level sentiment analysis methods

    Recommender Systems for Online and Mobile Social Networks: A survey

    Full text link
    Recommender Systems (RS) currently represent a fundamental tool in online services, especially with the advent of Online Social Networks (OSN). In this case, users generate huge amounts of contents and they can be quickly overloaded by useless information. At the same time, social media represent an important source of information to characterize contents and users' interests. RS can exploit this information to further personalize suggestions and improve the recommendation process. In this paper we present a survey of Recommender Systems designed and implemented for Online and Mobile Social Networks, highlighting how the use of social context information improves the recommendation task, and how standard algorithms must be enhanced and optimized to run in a fully distributed environment, as opportunistic networks. We describe advantages and drawbacks of these systems in terms of algorithms, target domains, evaluation metrics and performance evaluations. Eventually, we present some open research challenges in this area

    Social analytics for health integration, intelligence, and monitoring

    Get PDF
    Nowadays, patient-generated social health data are abundant and Healthcare is changing from the authoritative provider-centric model to collaborative and patient-oriented care. The aim of this dissertation is to provide a Social Health Analytics framework to utilize social data to solve the interdisciplinary research challenges of Big Data Science and Health Informatics. Specific research issues and objectives are described below. The first objective is semantic integration of heterogeneous health data sources, which can vary from structured to unstructured and include patient-generated social data as well as authoritative data. An information seeker has to spend time selecting information from many websites and integrating it into a coherent mental model. An integrated health data model is designed to allow accommodating data features from different sources. The model utilizes semantic linked data for lightweight integration and allows a set of analytics and inferences over data sources. A prototype analytical and reasoning tool called “Social InfoButtons” that can be linked from existing EHR systems is developed to allow doctors to understand and take into consideration the behaviors, patterns or trends of patients’ healthcare practices during a patient’s care. The tool can also shed insights for public health officials to make better-informed policy decisions. The second objective is near-real time monitoring of disease outbreaks using social media. The research for epidemics detection based on search query terms entered by millions of users is limited by the fact that query terms are not easily accessible by non-affiliated researchers. Publically available Twitter data is exploited to develop the Epidemics Outbreak and Spread Detection System (EOSDS). EOSDS provides four visual analytics tools for monitoring epidemics, i.e., Instance Map, Distribution Map, Filter Map, and Sentiment Trend to investigate public health threats in space and time. The third objective is to capture, analyze and quantify public health concerns through sentiment classifications on Twitter data. For traditional public health surveillance systems, it is hard to detect and monitor health related concerns and changes in public attitudes to health-related issues, due to their expenses and significant time delays. A two-step sentiment classification model is built to measure the concern. In the first step, Personal tweets are distinguished from Non-Personal tweets. In the second step, Personal Negative tweets are further separated from Personal Non-Negative tweets. In the proposed classification, training data is labeled by an emotion-oriented, clue-based method, and three Machine Learning models are trained and tested. Measure of Concern (MOC) is computed based on the number of Personal Negative sentiment tweets. A timeline trend of the MOC is also generated to monitor public concern levels, which is important for health emergency resource allocations and policy making. The fourth objective is predicting medical condition incidence and progression trajectories by using patients’ self-reported data on PatientsLikeMe. Some medical conditions are correlated with each other to a measureable degree (“comorbidities”). A prediction model is provided to predict the comorbidities and rank future conditions by their likelihood and to predict the possible progression trajectories given an observed medical condition. The novel models for trajectory prediction of medical conditions are validated to cover the comorbidities reported in the medical literature

    Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media

    Get PDF
    Real-time social systems like Facebook, Twitter, and Snapchat have been growing rapidly, producing exabytes of data in different views or aspects. Coupled with more and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable information regarding “who”, “where”, “when” and “what”, these real-time human sensor data promise new research opportunities to uncover models of user behavior, mobility, and information sharing. These real-time dynamics in social systems usually come in multiple aspects, which are able to help better understand the social interactions of the underlying network. However, these multi-aspect datasets are often raw and incomplete owing to various unpredictable or unavoidable reasons; for instance, API limitations and data sampling policies can lead to an incomplete (and often biased) perspective on these multi-aspect datasets. This missing data could raise serious concerns such as biased estimations on structural properties of the network and properties of information cascades in social networks. In order to recover missing values or information in social systems, we identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption of rich side information that is able to describe the similarities of entities, generation of robust models rather than limiting them on specific applications, and scalability of models to handle real large-scale datasets (billions of observed entries). With these challenges in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks, algorithms and methods for recovering missing information on social media. In particular, this dissertation research makes four unique contributions: _ The first research contribution of this dissertation research is to propose a scalable framework based on low-rank tensor learning in the presence of incomplete information. Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based factorization approach based on the alternative direction method of multipliers (ADMM) with the integration of the latent relationships derived from contextual information among locations, memes, and times. _ The second research contribution of this dissertation research is to evaluate the generalization of the proposed tensor learning framework and extend it to the recommendation problem. In particular, we develop a novel tensor-based approach to solve the personalized expert recommendation by integrating both the latent relationships between homogeneous entities (e.g., users and users, experts and experts) and the relationships between heterogeneous entities (e.g., users and experts, topics and experts) from the geo-spatial, topical, and social contexts. _ The third research contribution of this dissertation research is to extend the proposed tensor learning framework to the user topical profiling problem. Specifically, we propose a tensor-based contextual regularization model embedded into a matrix factorization framework, which leverages the social, textual, and behavioral contexts across users, in order to overcome identified challenges. _ The fourth research contribution of this dissertation research is to scale up the proposed tensor learning framework to be capable of handling real large-scale datasets that are too big to fit in the main memory of a single machine. Particularly, we propose a novel distributed tensor completion algorithm with the trace-based regularization of the auxiliary information based on ADMM under the proposed tensor learning framework, which is designed to scale up to real large-scale tensors (e.g., billions of entries) by efficiently computing auxiliary variables, minimizing intermediate data, and reducing the workload of updating new tensors

    Estimating community feedback effect on topic choice in social media with predictive modeling

    Get PDF
    Social media users post content on various topics. A defining feature of social media is that other users can provide feedback—called community feedback—to their content in the form of comments, replies, and retweets. We hypothesize that the amount of received feedback influences the choice of topics on which a social media user posts. However, it is challenging to test this hypothesis as user heterogeneity and external confounders complicate measuring the feedback effect. Here, we investigate this hypothesis with a predictive approach based on an interpretable model of an author’s decision to continue the topic of their previous post. We explore the confounding factors, including author’s topic preferences and unobserved external factors such as news and social events, by optimizing the predictive accuracy. This approach enables us to identify which users are susceptible to community feedback. Overall, we find that 33% and 14% of active users in Reddit and Twitter, respectively, are influenced by community feedback. The model suggests that this feedback alters the probability of topic continuation up to 14%, depending on the user and the amount of feedback

    Finetuning Pre-Trained Language Models for Sentiment Classification of COVID19 Tweets

    Get PDF
    It is a common practice in today’s world for the public to use different micro-blogging and social networking platforms, predominantly Twitter, to share opinions, ideas, news, and information about many things in life. Twitter is also becoming a popular channel for information sharing during pandemic outbreaks and disaster events. The world has been suffering from economic crises ever since COVID-19 cases started to increase rapidly since January 2020. The virus has killed more than 800 thousand people ever since the discovery as per the statistics from Worldometer [1] which is the authorized tracking website. So many researchers around the globe are researching into this new virus from different perspectives. One such area is analysing micro-blogging sites like twitter to understand public sentiments. Traditional sentiment analysis methods require complex feature engineering. Many embedding representations have come these days but, their context-independent nature limits their representative power in rich context, due to which performance gets degraded in NLP tasks. Transfer learning has gained the popularity and pretrained language models like BERT(bi-directional Encoder Representations from Transformers) and XLNet which is a Generalised autoregressive model have started overtaking traditional machine learning and deep learning models like Random Forests, Naïve Bayes, Convolutional Neural Networks etc. Despite the great performance results by pretrained language models, it has been observed that finetuning a large pretrained model on downstream task with less training instances is prone to degrade the performance of the model. This research is based on a regularization technique called Mixout proposed by Lee (Lee, 2020). Mixout stochastically mixes the parameters of vanilla network and dropout network. This work is to understand the performance variations of finetuning BERT and XLNet base models on COVID-19 tweets by using Mixout regularization for sentiment classification
    • …
    corecore