7,017 research outputs found

    Topicality and Social Impact: Diverse Messages but Focused Messengers

    Full text link
    Are users who comment on a variety of matters more likely to achieve high influence than those who delve into one focused field? Do general Twitter hashtags, such as #lol, tend to be more popular than novel ones, such as #instantlyinlove? Questions like these demand a way to detect topics hidden behind messages associated with an individual or a hashtag, and a gauge of similarity among these topics. Here we develop such an approach to identify clusters of similar hashtags by detecting communities in the hashtag co-occurrence network. Then the topical diversity of a user's interests is quantified by the entropy of her hashtags across different topic clusters. A similar measure is applied to hashtags, based on co-occurring tags. We find that high topical diversity of early adopters or co-occurring tags implies high future popularity of hashtags. In contrast, low diversity helps an individual accumulate social influence. In short, diverse messages and focused messengers are more likely to gain impact.Comment: 9 pages, 7 figures, 6 table

    Modeling Temporal Evidence from External Collections

    Full text link
    Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic's temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.Comment: To appear in WSDM 201

    Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data

    Get PDF
    There is considerable interest among both researchers and the mass public in understanding the topics of discussion on social media as they occur over time. Scholars have thoroughly analysed sampling-based topic modelling approaches for various text corpora including social media; however, another LDA topic modelling implementation—Variational Bayesian (VB)—has not been well studied, despite its known efficiency and its adaptability to the volume and dynamics of social media data. In this paper, we examine the performance of the VB-based topic modelling approach for producing coherent topics, and further, we extend the VB approach by proposing a novel time-sensitive Variational Bayesian implementation, denoted as TVB. Our newly proposed TVB approach incorporates time so as to increase the quality of the generated topics. Using a Twitter dataset covering 8 events, our empirical results show that the coherence of the topics in our TVB model is improved by the integration of time. In particular, through a user study, we find that our TVB approach generates less mixed topics than state-of-the-art topic modelling approaches. Moreover, our proposed TVB approach can more accurately estimate topical trends, making it particularly suitable to assist end-users in tracking emerging topics on social media

    A large multilingual and multi-domain dataset for recommender systems

    Get PDF
    This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others

    Is That Twitter Hashtag Worth Reading

    Full text link
    Online social media such as Twitter, Facebook, Wikis and Linkedin have made a great impact on the way we consume information in our day to day life. Now it has become increasingly important that we come across appropriate content from the social media to avoid information explosion. In case of Twitter, popular information can be tracked using hashtags. Studying the characteristics of tweets containing hashtags becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, and sentiment analysis among others. In this paper, we have analyzed Twitter data based on trending hashtags, which is widely used nowadays. We have used event based hashtags to know users' thoughts on those events and to decide whether the rest of the users might find it interesting or not. We have used topic modeling, which reveals the hidden thematic structure of the documents (tweets in this case) in addition to sentiment analysis in exploring and summarizing the content of the documents. A technique to find the interestingness of event based twitter hashtag and the associated sentiment has been proposed. The proposed technique helps twitter follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium on Women in Computing and Informatics (WCI-2015

    Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach

    Full text link
    The ever-growing number of people using Twitter makes it a valuable source of timely information. However, detecting events in Twitter is a difficult task, because tweets that report interesting events are overwhelmed by a large volume of tweets on unrelated topics. Existing methods focus on the textual content of tweets and ignore the social aspect of Twitter. In this paper we propose MABED (i.e. mention-anomaly-based event detection), a novel statistical method that relies solely on tweets and leverages the creation frequency of dynamic links (i.e. mentions) that users insert in tweets to detect significant events and estimate the magnitude of their impact over the crowd. MABED also differs from the literature in that it dynamically estimates the period of time during which each event is discussed, rather than assuming a predefined fixed duration for all events. The experiments we conducted on both English and French Twitter data show that the mention-anomaly-based approach leads to more accurate event detection and improved robustness in presence of noisy Twitter content. Qualitatively speaking, we find that MABED helps with the interpretation of detected events by providing clear textual descriptions and precise temporal descriptions. We also show how MABED can help understanding users' interest. Furthermore, we describe three visualizations designed to favor an efficient exploration of the detected events.Comment: 17 page

    Bots, Seeds and People: Web Archives as Infrastructure

    Full text link
    The field of web archiving provides a unique mix of human and automated agents collaborating to achieve the preservation of the web. Centuries old theories of archival appraisal are being transplanted into the sociotechnical environment of the World Wide Web with varying degrees of success. The work of the archivist and bots in contact with the material of the web present a distinctive and understudied CSCW shaped problem. To investigate this space we conducted semi-structured interviews with archivists and technologists who were directly involved in the selection of content from the web for archives. These semi-structured interviews identified thematic areas that inform the appraisal process in web archives, some of which are encoded in heuristics and algorithms. Making the infrastructure of web archives legible to the archivist, the automated agents and the future researcher is presented as a challenge to the CSCW and archival community
    • …
    corecore