108 research outputs found
Hot Streaks on Social Media
Measuring the impact and success of human performance is common in various
disciplines, including art, science, and sports. Quantifying impact also plays
a key role on social media, where impact is usually defined as the reach of a
user's content as captured by metrics such as the number of views, likes,
retweets, or shares. In this paper, we study entire careers of Twitter users to
understand properties of impact. We show that user impact tends to have certain
characteristics: First, impact is clustered in time, such that the most
impactful tweets of a user appear close to each other. Second, users commonly
have 'hot streaks' of impact, i.e., extended periods of high-impact tweets.
Third, impact tends to gradually build up before, and fall off after, a user's
most impactful tweet. We attempt to explain these characteristics using various
properties measured on social media, including the user's network, content,
activity, and experience, and find that changes in impact are associated with
significant changes in these properties. Our findings open interesting avenues
for future research on virality and influence on social media.Comment: Accepted as a full paper at ICWSM 2019. Please cite the ICWSM versio
Co-Following on Twitter
We present an in-depth study of co-following on Twitter based on the
observation that two Twitter users whose followers have similar friends are
also similar, even though they might not share any direct links or a single
mutual follower. We show how this observation contributes to (i) a better
understanding of language-agnostic user classification on Twitter, (ii)
eliciting opportunities for Computational Social Science, and (iii) improving
online marketing by identifying cross-selling opportunities.
We start with a machine learning problem of predicting a user's preference
among two alternative choices of Twitter friends. We show that co-following
information provides strong signals for diverse classification tasks and that
these signals persist even when (i) the most discriminative features are
removed and (ii) only relatively "sparse" users with fewer than 152 but more
than 43 Twitter friends are considered.
Going beyond mere classification performance optimization, we present
applications of our methodology to Computational Social Science. Here we
confirm stereotypes such as that the country singer Kenny Chesney
(@kennychesney) is more popular among @GOP followers, whereas Lady Gaga
(@ladygaga) enjoys more support from @TheDemocrats followers.
In the domain of marketing we give evidence that celebrity endorsement is
reflected in co-following and we demonstrate how our methodology can be used to
reveal the audience similarities between Apple and Puma and, less obviously,
between Nike and Coca-Cola. Concerning a user's popularity we find a
statistically significant connection between having a more "average"
followership and having more followers than direct rivals. Interestingly, a
\emph{larger} audience also seems to be linked to a \emph{less diverse}
audience in terms of their co-following.Comment: full version of a short paper at Hypertext 201
A Motif-based Approach for Identifying Controversy
Among the topics discussed in Social Media, some lead to controversy. A
number of recent studies have focused on the problem of identifying controversy
in social media mostly based on the analysis of textual content or rely on
global network structure. Such approaches have strong limitations due to the
difficulty of understanding natural language, and of investigating the global
network structure. In this work we show that it is possible to detect
controversy in social media by exploiting network motifs, i.e., local patterns
of user interaction. The proposed approach allows for a language-independent
and fine- grained and efficient-to-compute analysis of user discussions and
their evolution over time. The supervised model exploiting motif patterns can
achieve 85% accuracy, with an improvement of 7% compared to baseline
structural, propagation-based and temporal network features
Professional Gender Gaps Across US Cities
Gender imbalances in work environments have been a long-standing concern.
Identifying the existence of such imbalances is key to designing policies to
help overcome them. In this work, we study gender trends in employment across
various dimensions in the United States. This is done by analyzing anonymous,
aggregate statistics that were extracted from LinkedIn's advertising platform.
The data contain the number of male and female LinkedIn users with respect to
(i) location, (ii) age, (iii) industry and (iv) certain skills. We studied
which of these categories correlate the most with high relative male or female
presence on LinkedIn. In addition to examining the summary statistics of the
LinkedIn data, we model the gender balance as a function of the different
employee features using linear regression. Our results suggest that the gender
gap varies across all feature types, but the differences are most profound
among industries and skills. A high correlation between gender ratios of people
in our LinkedIn data set and data provided by the US Bureau of Labor Statistics
serves as external validation for our results.Comment: Accepted at a poster at ICWSM 2018. Please cite the ICWSM versio
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
- …