14,212 research outputs found
Large scale homophily analysis in twitter using a twixonomy
In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A wordsense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity. We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that midlow level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
On the Role of Social Identity and Cohesion in Characterizing Online Social Communities
Two prevailing theories for explaining social group or community structure
are cohesion and identity. The social cohesion approach posits that social
groups arise out of an aggregation of individuals that have mutual
interpersonal attraction as they share common characteristics. These
characteristics can range from common interests to kinship ties and from social
values to ethnic backgrounds. In contrast, the social identity approach posits
that an individual is likely to join a group based on an intrinsic
self-evaluation at a cognitive or perceptual level. In other words group
members typically share an awareness of a common category membership.
In this work we seek to understand the role of these two contrasting theories
in explaining the behavior and stability of social communities in Twitter. A
specific focal point of our work is to understand the role of these theories in
disparate contexts ranging from disaster response to socio-political activism.
We extract social identity and social cohesion features-of-interest for large
scale datasets of five real-world events and examine the effectiveness of such
features in capturing behavioral characteristics and the stability of groups.
We also propose a novel measure of social group sustainability based on the
divergence in group discussion. Our main findings are: 1) Sharing of social
identities (especially physical location) among group members has a positive
impact on group sustainability, 2) Structural cohesion (represented by high
group density and low average shortest path length) is a strong indicator of
group sustainability, and 3) Event characteristics play a role in shaping group
sustainability, as social groups in transient events behave differently from
groups in events that last longer
Collective Influence of Multiple Spreaders Evaluated by Tracing Real Information Flow in Large-Scale Social Networks
Identifying the most influential spreaders that maximize information flow is
a central question in network theory. Recently, a scalable method called
"Collective Influence (CI)" has been put forward through collective influence
maximization. In contrast to heuristic methods evaluating nodes' significance
separately, CI method inspects the collective influence of multiple spreaders.
Despite that CI applies to the influence maximization problem in percolation
model, it is still important to examine its efficacy in realistic information
spreading. Here, we examine real-world information flow in various social and
scientific platforms including American Physical Society, Facebook, Twitter and
LiveJournal. Since empirical data cannot be directly mapped to ideal
multi-source spreading, we leverage the behavioral patterns of users extracted
from data to construct "virtual" information spreading processes. Our results
demonstrate that the set of spreaders selected by CI can induce larger scale of
information propagation. Moreover, local measures as the number of connections
or citations are not necessarily the deterministic factors of nodes' importance
in realistic information spreading. This result has significance for rankings
scientists in scientific networks like the APS, where the commonly used number
of citations can be a poor indicator of the collective influence of authors in
the community.Comment: 11 pages, 4 figure
- …