108 research outputs found

    Hot Streaks on Social Media

    Full text link
    Measuring the impact and success of human performance is common in various disciplines, including art, science, and sports. Quantifying impact also plays a key role on social media, where impact is usually defined as the reach of a user's content as captured by metrics such as the number of views, likes, retweets, or shares. In this paper, we study entire careers of Twitter users to understand properties of impact. We show that user impact tends to have certain characteristics: First, impact is clustered in time, such that the most impactful tweets of a user appear close to each other. Second, users commonly have 'hot streaks' of impact, i.e., extended periods of high-impact tweets. Third, impact tends to gradually build up before, and fall off after, a user's most impactful tweet. We attempt to explain these characteristics using various properties measured on social media, including the user's network, content, activity, and experience, and find that changes in impact are associated with significant changes in these properties. Our findings open interesting avenues for future research on virality and influence on social media.Comment: Accepted as a full paper at ICWSM 2019. Please cite the ICWSM versio

    Co-Following on Twitter

    Full text link
    We present an in-depth study of co-following on Twitter based on the observation that two Twitter users whose followers have similar friends are also similar, even though they might not share any direct links or a single mutual follower. We show how this observation contributes to (i) a better understanding of language-agnostic user classification on Twitter, (ii) eliciting opportunities for Computational Social Science, and (iii) improving online marketing by identifying cross-selling opportunities. We start with a machine learning problem of predicting a user's preference among two alternative choices of Twitter friends. We show that co-following information provides strong signals for diverse classification tasks and that these signals persist even when (i) the most discriminative features are removed and (ii) only relatively "sparse" users with fewer than 152 but more than 43 Twitter friends are considered. Going beyond mere classification performance optimization, we present applications of our methodology to Computational Social Science. Here we confirm stereotypes such as that the country singer Kenny Chesney (@kennychesney) is more popular among @GOP followers, whereas Lady Gaga (@ladygaga) enjoys more support from @TheDemocrats followers. In the domain of marketing we give evidence that celebrity endorsement is reflected in co-following and we demonstrate how our methodology can be used to reveal the audience similarities between Apple and Puma and, less obviously, between Nike and Coca-Cola. Concerning a user's popularity we find a statistically significant connection between having a more "average" followership and having more followers than direct rivals. Interestingly, a \emph{larger} audience also seems to be linked to a \emph{less diverse} audience in terms of their co-following.Comment: full version of a short paper at Hypertext 201

    A Motif-based Approach for Identifying Controversy

    Full text link
    Among the topics discussed in Social Media, some lead to controversy. A number of recent studies have focused on the problem of identifying controversy in social media mostly based on the analysis of textual content or rely on global network structure. Such approaches have strong limitations due to the difficulty of understanding natural language, and of investigating the global network structure. In this work we show that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction. The proposed approach allows for a language-independent and fine- grained and efficient-to-compute analysis of user discussions and their evolution over time. The supervised model exploiting motif patterns can achieve 85% accuracy, with an improvement of 7% compared to baseline structural, propagation-based and temporal network features

    Professional Gender Gaps Across US Cities

    Full text link
    Gender imbalances in work environments have been a long-standing concern. Identifying the existence of such imbalances is key to designing policies to help overcome them. In this work, we study gender trends in employment across various dimensions in the United States. This is done by analyzing anonymous, aggregate statistics that were extracted from LinkedIn's advertising platform. The data contain the number of male and female LinkedIn users with respect to (i) location, (ii) age, (iii) industry and (iv) certain skills. We studied which of these categories correlate the most with high relative male or female presence on LinkedIn. In addition to examining the summary statistics of the LinkedIn data, we model the gender balance as a function of the different employee features using linear regression. Our results suggest that the gender gap varies across all feature types, but the differences are most profound among industries and skills. A high correlation between gender ratios of people in our LinkedIn data set and data provided by the US Bureau of Labor Statistics serves as external validation for our results.Comment: Accepted at a poster at ICWSM 2018. Please cite the ICWSM versio

    Scalable Facility Location for Massive Graphs on Pregel-like Systems

    Full text link
    We propose a new scalable algorithm for facility location. Facility location is a classic problem, where the goal is to select a subset of facilities to open, from a set of candidate facilities F , in order to serve a set of clients C. The objective is to minimize the total cost of opening facilities plus the cost of serving each client from the facility it is assigned to. In this work, we are interested in the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on the graph. This setting allows to model natural problems arising in the Web and in social media applications. It also allows to leverage the inherent sparsity of such graphs, as the input is much smaller than the full pairwise distances between all vertices. To obtain truly scalable performance, we design a parallel algorithm that operates on clusters of shared-nothing machines. In particular, we target modern Pregel-like architectures, and we implement our algorithm on Apache Giraph. Our solution makes use of a recent result to build sketches for massive graphs, and of a fast parallel algorithm to find maximal independent sets, as building blocks. In so doing, we show how these problems can be solved on a Pregel-like architecture, and we investigate the properties of these algorithms. Extensive experimental results show that our algorithm scales gracefully to graphs with billions of edges, while obtaining values of the objective function that are competitive with a state-of-the-art sequential algorithm
    • …
    corecore