7,017 research outputs found
Topicality and Social Impact: Diverse Messages but Focused Messengers
Are users who comment on a variety of matters more likely to achieve high
influence than those who delve into one focused field? Do general Twitter
hashtags, such as #lol, tend to be more popular than novel ones, such as
#instantlyinlove? Questions like these demand a way to detect topics hidden
behind messages associated with an individual or a hashtag, and a gauge of
similarity among these topics. Here we develop such an approach to identify
clusters of similar hashtags by detecting communities in the hashtag
co-occurrence network. Then the topical diversity of a user's interests is
quantified by the entropy of her hashtags across different topic clusters. A
similar measure is applied to hashtags, based on co-occurring tags. We find
that high topical diversity of early adopters or co-occurring tags implies high
future popularity of hashtags. In contrast, low diversity helps an individual
accumulate social influence. In short, diverse messages and focused messengers
are more likely to gain impact.Comment: 9 pages, 7 figures, 6 table
Modeling Temporal Evidence from External Collections
Newsworthy events are broadcast through multiple mediums and prompt the
crowds to produce comments on social media. In this paper, we propose to
leverage on this behavioral dynamics to estimate the most relevant time periods
for an event (i.e., query). Recent advances have shown how to improve the
estimation of the temporal relevance of such topics. In this approach, we build
on two major novelties. First, we mine temporal evidences from hundreds of
external sources into topic-based external collections to improve the
robustness of the detection of relevant time periods. Second, we propose a
formal retrieval model that generalizes the use of the temporal dimension
across different aspects of the retrieval process. In particular, we show that
temporal evidence of external collections can be used to (i) infer a topic's
temporal relevance, (ii) select the query expansion terms, and (iii) re-rank
the final results for improved precision. Experiments with TREC Microblog
collections show that the proposed time-aware retrieval model makes an
effective and extensive use of the temporal dimension to improve search results
over the most recent temporal models. Interestingly, we observe a strong
correlation between precision and the temporal distribution of retrieved and
relevant documents.Comment: To appear in WSDM 201
Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data
There is considerable interest among both researchers and the mass public in understanding the topics of discussion on social media as they occur over time. Scholars have thoroughly analysed sampling-based topic modelling approaches for various text corpora including social media; however, another LDA topic modelling implementation—Variational Bayesian (VB)—has not been well studied, despite its known efficiency and its adaptability to the volume and dynamics of social media data. In this paper, we examine the performance of the VB-based topic modelling approach for producing coherent topics, and further, we extend the VB approach by proposing a novel time-sensitive Variational Bayesian implementation, denoted as TVB. Our newly proposed TVB approach incorporates time so as to increase the quality of the generated topics. Using a Twitter dataset covering 8 events, our empirical results show that the coherence of the topics in our TVB model is improved by the integration of time. In particular, through a user study, we find that our TVB approach generates less mixed topics than state-of-the-art topic modelling approaches. Moreover, our proposed TVB approach can more accurately estimate topical trends, making it particularly suitable to assist end-users in tracking emerging topics on social media
A large multilingual and multi-domain dataset for recommender systems
This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset
from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books,
movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of
users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees
representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles
describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting
available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others
Is That Twitter Hashtag Worth Reading
Online social media such as Twitter, Facebook, Wikis and Linkedin have made a
great impact on the way we consume information in our day to day life. Now it
has become increasingly important that we come across appropriate content from
the social media to avoid information explosion. In case of Twitter, popular
information can be tracked using hashtags. Studying the characteristics of
tweets containing hashtags becomes important for a number of tasks, such as
breaking news detection, personalized message recommendation, friends
recommendation, and sentiment analysis among others.
In this paper, we have analyzed Twitter data based on trending hashtags,
which is widely used nowadays. We have used event based hashtags to know users'
thoughts on those events and to decide whether the rest of the users might find
it interesting or not. We have used topic modeling, which reveals the hidden
thematic structure of the documents (tweets in this case) in addition to
sentiment analysis in exploring and summarizing the content of the documents. A
technique to find the interestingness of event based twitter hashtag and the
associated sentiment has been proposed. The proposed technique helps twitter
follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium
on Women in Computing and Informatics (WCI-2015
Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach
The ever-growing number of people using Twitter makes it a valuable source of
timely information. However, detecting events in Twitter is a difficult task,
because tweets that report interesting events are overwhelmed by a large volume
of tweets on unrelated topics. Existing methods focus on the textual content of
tweets and ignore the social aspect of Twitter. In this paper we propose MABED
(i.e. mention-anomaly-based event detection), a novel statistical method that
relies solely on tweets and leverages the creation frequency of dynamic links
(i.e. mentions) that users insert in tweets to detect significant events and
estimate the magnitude of their impact over the crowd. MABED also differs from
the literature in that it dynamically estimates the period of time during which
each event is discussed, rather than assuming a predefined fixed duration for
all events. The experiments we conducted on both English and French Twitter
data show that the mention-anomaly-based approach leads to more accurate event
detection and improved robustness in presence of noisy Twitter content.
Qualitatively speaking, we find that MABED helps with the interpretation of
detected events by providing clear textual descriptions and precise temporal
descriptions. We also show how MABED can help understanding users' interest.
Furthermore, we describe three visualizations designed to favor an efficient
exploration of the detected events.Comment: 17 page
Transport geography: perspectives upon entering an accomplished research sub-discipline
No abstract available
Bots, Seeds and People: Web Archives as Infrastructure
The field of web archiving provides a unique mix of human and automated
agents collaborating to achieve the preservation of the web. Centuries old
theories of archival appraisal are being transplanted into the sociotechnical
environment of the World Wide Web with varying degrees of success. The work of
the archivist and bots in contact with the material of the web present a
distinctive and understudied CSCW shaped problem. To investigate this space we
conducted semi-structured interviews with archivists and technologists who were
directly involved in the selection of content from the web for archives. These
semi-structured interviews identified thematic areas that inform the appraisal
process in web archives, some of which are encoded in heuristics and
algorithms. Making the infrastructure of web archives legible to the archivist,
the automated agents and the future researcher is presented as a challenge to
the CSCW and archival community
- …