1,304 research outputs found

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    Real-time detection and sorting of news on microblogging platforms

    Get PDF
    Due to the increasing popularity of microblog-ging platforms (e.g., Twitter), detecting real-Time news from microblogs (e.g., tweets) has recently drawn a lot of attention. Most of the previous work on this subject detect news by analyzing propagation patterns of microblogs. This approach has two limitations: (i) many non-news microblogs (e.g. marketing activi-Ties) have propagation patterns similar to news microblogs and therefore they can be false-ly reported as news; (ii) using propagation patterns to identify news involves a time de-lay until the pattern is formed, therefore news are not detected in real time. We propose an alternative approach, which, motivated by the necessity of real-Time detection of news, does not rely on propagation of posts. More-over, we propose a real-Time sorting strategy that orders the detected news microblogs us-ing a translational approach. An experimen-Tal evaluation on a large-scale microblogging dataset demonstrates the effectiveness of our approach.postprin

    Identification of Online Users' Social Status via Mining User-Generated Data

    Get PDF
    With the burst of available online user-generated data, identifying online users’ social status via mining user-generated data can play a significant role in many commercial applications, research and policy-making in many domains. Social status refers to the position of a person in relation to others within a society, which is an abstract concept. The actual definition of social status is specific in terms of specific measure indicator. For example, opinion leadership measures individual social status in terms of influence and expertise in an online society, while socioeconomic status characterizes personal real-life social status based on social and economic factors. Compared with traditional survey method which is time-consuming, expensive and sometimes difficult, some efforts have been made to identify specific social status of users based on specific user-generated data using classic machine learning methods. However, in fact, regarding specific social status identification based on specific user-generated data, the specific case has several specific challenges. However, classic machine learning methods in existing works fail to address these challenges, which lead to low identification accuracy. Given the importance of improving identification accuracy, this thesis studies three specific cases on identification of online and offline social status. For each work, this thesis proposes novel effective identification method to address the specific challenges for improving accuracy. The first work aims at identifying users’ online social status in terms of topic-sensitive influence and knowledge authority in social community question answering sites, namely identifying topical opinion leaders who are both influential and expert. Social community question answering (SCQA) site, an innovative community question answering platform, not only offers traditional question answering (QA) services but also integrates an online social network where users can follow each other. Identifying topical opinion leaders in SCQA has become an important research area due to the significant role of topical opinion leaders. However, most previous related work either focus on using knowledge expertise to find experts for improving the quality of answers, or aim at measuring user influence to identify influential ones. In order to identify the true topical opinion leaders, we propose a topical opinion leader identification framework called QALeaderRank which takes account of both topic-sensitive influence and topical knowledge expertise. In the proposed framework, to measure the topic-sensitive influence of each user, we design a novel influence measure algorithm that exploits both the social and QA features of SCQA, taking into account social network structure, topical similarity and knowledge authority. In addition, we propose three topic-relevant metrics to infer the topical expertise of each user. The extensive experiments along with an online user study show that the proposed QALeaderRank achieves significant improvement compared with the state-of-the-art methods. Furthermore, we analyze the topic interest change behaviors of users over time and examine the predictability of user topic interest through experiments. The second work focuses on predicting individual socioeconomic status from mobile phone data. Socioeconomic Status (SES) is an important social and economic aspect widely concerned. Assessing individual SES can assist related organizations in making a variety of policy decisions. Traditional approach suffers from the extremely high cost in collecting large-scale SES-related survey data. With the ubiquity of smart phones, mobile phone data has become a novel data source for predicting individual SES with low cost. However, the task of predicting individual SES on mobile phone data also proposes some new challenges, including sparse individual records, scarce explicit relationships and limited labeled samples, unconcerned in prior work restricted to regional or household-oriented SES prediction. To address these issues, we propose a semi-supervised Hypergraph based Factor Graph Model (HyperFGM) for individual SES prediction. HyperFGM is able to efficiently capture the associations between SES and individual mobile phone records to handle the individual record sparsity. For the scarce explicit relationships, HyperFGM models implicit high-order relationships among users on the hypergraph structure. Besides, HyperFGM explores the limited labeled data and unlabeled data in a semi-supervised way. Experimental results show that HyperFGM greatly outperforms the baseline methods on individual SES prediction with using a set of anonymized real mobile phone data. The third work is to predict social media users’ socioeconomic status based on their social media content, which is useful for related organizations and companies in a range of applications, such as economic and social policy-making. Previous work leverage manually defined textual features and platform-based user level attributes from social media content and feed them into a machine learning based classifier for SES prediction. However, they ignore some important information of social media content, containing the order and the hierarchical structure of social media text as well as the relationships among user level attributes. To this end, we propose a novel coupled social media content representation model for individual SES prediction, which not only utilizes a hierarchical neural network to incorporate the order and the hierarchical structure of social media text but also employs a coupled attribute representation method to take into account intra-coupled and inter-coupled interaction relationships among user level attributes. The experimental results show that the proposed model significantly outperforms other stat-of-the-art models on a real dataset, which validate the efficiency and robustness of the proposed model

    Cost-effective online trending topic detection and popularity prediction in microblogging

    Get PDF
    Identifying topic trends on microblogging services such as Twitter and estimating those topics’ future popularity have great academic and business value, especially when the operations can be done in real time. For any third party, however, capturing and processing such huge volumes of real-time data in microblogs are almost infeasible tasks, as there always exist API (Application Program Interface) request limits, monitoring and computing budgets, as well as timeliness requirements. To deal with these challenges, we propose a cost-effective system framework with algorithms that can automatically select a subset of representative users in microblogging networks in offline, under given cost constraints. Then the proposed system can online monitor and utilize only these selected users’ real-time microposts to detect the overall trending topics and predict their future popularity among the whole microblogging network. Therefore, our proposed system framework is practical for real-time usage as it avoids the high cost in capturing and processing full real-time data, while not compromising detection and prediction performance under given cost constraints. Experiments with real microblogs dataset show that by tracking only 500 users out of 0.6 million users and processing no more than 30,000 microposts daily, about 92% trending topics could be detected and predicted by the proposed system and, on average, more than 10 hours earlier than they appear in official trends lists
    • …
    corecore