10,517 research outputs found
Topicality and Social Impact: Diverse Messages but Focused Messengers
Are users who comment on a variety of matters more likely to achieve high
influence than those who delve into one focused field? Do general Twitter
hashtags, such as #lol, tend to be more popular than novel ones, such as
#instantlyinlove? Questions like these demand a way to detect topics hidden
behind messages associated with an individual or a hashtag, and a gauge of
similarity among these topics. Here we develop such an approach to identify
clusters of similar hashtags by detecting communities in the hashtag
co-occurrence network. Then the topical diversity of a user's interests is
quantified by the entropy of her hashtags across different topic clusters. A
similar measure is applied to hashtags, based on co-occurring tags. We find
that high topical diversity of early adopters or co-occurring tags implies high
future popularity of hashtags. In contrast, low diversity helps an individual
accumulate social influence. In short, diverse messages and focused messengers
are more likely to gain impact.Comment: 9 pages, 7 figures, 6 table
Agents, Bookmarks and Clicks: A topical model of Web traffic
Analysis of aggregate and individual Web traffic has shown that PageRank is a
poor model of how people navigate the Web. Using the empirical traffic patterns
generated by a thousand users, we characterize several properties of Web
traffic that cannot be reproduced by Markovian models. We examine both
aggregate statistics capturing collective behavior, such as page and link
traffic, and individual statistics, such as entropy and session size. No model
currently explains all of these empirical observations simultaneously. We show
that all of these traffic patterns can be explained by an agent-based model
that takes into account several realistic browsing behaviors. First, agents
maintain individual lists of bookmarks (a non-Markovian memory mechanism) that
are used as teleportation targets. Second, agents can retreat along visited
links, a branching mechanism that also allows us to reproduce behaviors such as
the use of a back button and tabbed browsing. Finally, agents are sustained by
visiting novel pages of topical interest, with adjacent pages being more
topically related to each other than distant ones. This modulates the
probability that an agent continues to browse or starts a new session, allowing
us to recreate heterogeneous session lengths. The resulting model is capable of
reproducing the collective and individual behaviors we observe in the empirical
data, reconciling the narrowly focused browsing patterns of individual users
with the extreme heterogeneity of aggregate traffic measurements. This result
allows us to identify a few salient features that are necessary and sufficient
to interpret the browsing patterns observed in our data. In addition to the
descriptive and explanatory power of such a model, our results may lead the way
to more sophisticated, realistic, and effective ranking and crawling
algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in
Proceedings of the 21th ACM conference on Hypertext and Hypermedi
Clustering based on Random Graph Model embedding Vertex Features
Large datasets with interactions between objects are common to numerous
scientific fields (i.e. social science, internet, biology...). The interactions
naturally define a graph and a common way to explore or summarize such dataset
is graph clustering. Most techniques for clustering graph vertices just use the
topology of connections ignoring informations in the vertices features. In this
paper, we provide a clustering algorithm exploiting both types of data based on
a statistical model with latent structure characterizing each vertex both by a
vector of features as well as by its connectivity. We perform simulations to
compare our algorithm with existing approaches, and also evaluate our method
with real datasets based on hyper-textual documents. We find that our algorithm
successfully exploits whatever information is found both in the connectivity
pattern and in the features
Folks in Folksonomies: Social Link Prediction from Shared Metadata
Web 2.0 applications have attracted a considerable amount of attention
because their open-ended nature allows users to create light-weight semantic
scaffolding to organize and share content. To date, the interplay of the social
and semantic components of social media has been only partially explored. Here
we focus on Flickr and Last.fm, two social media systems in which we can relate
the tagging activity of the users with an explicit representation of their
social network. We show that a substantial level of local lexical and topical
alignment is observable among users who lie close to each other in the social
network. We introduce a null model that preserves user activity while removing
local correlations, allowing us to disentangle the actual local alignment
between users from statistical effects due to the assortative mixing of user
activity and centrality in the social network. This analysis suggests that
users with similar topical interests are more likely to be friends, and
therefore semantic similarity measures among users based solely on their
annotation metadata should be predictive of social links. We test this
hypothesis on the Last.fm data set, confirming that the social network
constructed from semantic similarity captures actual friendship more accurately
than Last.fm's suggestions based on listening patterns.Comment: http://portal.acm.org/citation.cfm?doid=1718487.171852
- …