119 research outputs found
Evolution of Ego-networks in Social Media with Link Recommendations
Ego-networks are fundamental structures in social graphs, yet the process of
their evolution is still widely unexplored. In an online context, a key
question is how link recommender systems may skew the growth of these networks,
possibly restraining diversity. To shed light on this matter, we analyze the
complete temporal evolution of 170M ego-networks extracted from Flickr and
Tumblr, comparing links that are created spontaneously with those that have
been algorithmically recommended. We find that the evolution of ego-networks is
bursty, community-driven, and characterized by subsequent phases of explosive
diameter increase, slight shrinking, and stabilization. Recommendations favor
popular and well-connected nodes, limiting the diameter expansion. With a
matching experiment aimed at detecting causal relationships from observational
data, we find that the bias introduced by the recommendations fosters global
diversity in the process of neighbor selection. Last, with two link prediction
experiments, we show how insights from our analysis can be used to improve the
effectiveness of social recommender systems.Comment: Proceedings of the 10th ACM International Conference on Web Search
and Data Mining (WSDM 2017), Cambridge, UK. 10 pages, 16 figures, 1 tabl
Recommended from our members
A multi-scale framework for graph based machine learning problems
Graph data have become essential in representing and modeling relationships between entities and complex network structures in various domains such as social networks and recommender systems. As a main contributor of the recent Big Data trend, the massive scale of graphs in modern machine learning problems easily overwhelms existing methods and thus sophisticated scalable algorithms are needed for real-world applications. In this thesis, we develop a novel multi-scale framework based on the divide-and-conquer principle as an effective and scalable approach for machine learning tasks involving large sparse graphs. We first demonstrate how our multi-scale framework can be applied to the problem of computing the spectral decomposition of massive graphs, which is one of the most fundamental low-rank matrix approximations used in numerous machine learning tasks. While popular solvers suffer from slow convergence, especially when the desired rank is large, our method exploits the clustering structure of the graph and achieves superior performance compared to existing algorithms in terms of both accuracy and scalability. While the main goal of the divide-and-conquer approach is to efficiently compute solutions for the original problem, the proposed multi-scale framework further admits an attractive but less obvious feature that machine learning problems can benefit from. Particularly, we consider partial solutions of the subproblems computed in the process as localized models of the entire problem. By doing so, we can combine models at multiple scales from local to global and generate a holistic view of the underlying problem to achieve better performance than a single global view. We adapt such multi-scale view for the problems of link prediction in social networks and collaborative filtering in recommender systems with additional side information to obtain a model that can make accurate and robust predictions in a scalable manner.Computer Science
What people study when they study Tumblr:Classifying Tumblr-related academic research
Purpose: Since its launch in 2007, research has been carried out on the popular social networking website Tumblr. This paper identifies published Tumblr based research, classifies it to understand approaches and methods, and provides methodological recommendations for others. / Design/methodology/approach: Research regarding Tumblr was identified. Following a review of the literature, a classification scheme was adapted and applied, to understand research focus. Papers were quantitatively classified using open coded content analysis of method, subject, approach, and topic. / Findings: The majority of published work relating to Tumblr concentrates on conceptual issues, followed by aspects of the messages sent. This has evolved over time. Perceived benefits are the platform’s long-form text posts, ability to track tags, and the multimodal nature of the platform. Severe research limitations are caused by the lack of demographic, geo-spatial, and temporal metadata attached to individual posts, the limited API, restricted access to data, and the large amounts of ephemeral posts on the site. / Research limitations/implications: This study focuses on Tumblr: the applicability of the approach to other media is not considered. We focus on published research and conference papers: there will be book content which was not found using our method. Tumblr as a platform has falling user numbers which may be of concern to researchers. / Practical implications: We identify practical barriers to research on the Tumblr platform including lack of metadata and access to big data, explaining why Tumblr is not as popular as Twitter in academic studies. - Social implications This paper highlights the breadth of topics covered by social media researchers, which allows us to understand popular online platforms. / Originality/value: There has not yet been an overarching study to look at the methods and purpose of those who study Tumblr. We identify Tumblr related research papers from the first appearing in July 2011 until July 2015. Our classification derived here provides a framework that can be used to analyse social media research, and in which to position Tumblr related work, with recommendations on benefits and limitations of the platform for researchers
Gender and Interest Targeting for Sponsored Post Advertising at Tumblr
As one of the leading platforms for creative content, Tumblr offers
advertisers a unique way of creating brand identity. Advertisers can tell their
story through images, animation, text, music, video, and more, and promote that
content by sponsoring it to appear as an advertisement in the streams of Tumblr
users. In this paper we present a framework that enabled one of the key
targeted advertising components for Tumblr, specifically gender and interest
targeting. We describe the main challenges involved in development of the
framework, which include creating the ground truth for training gender
prediction models, as well as mapping Tumblr content to an interest taxonomy.
For purposes of inferring user interests we propose a novel semi-supervised
neural language model for categorization of Tumblr content (i.e., post tags and
post keywords). The model was trained on a large-scale data set consisting of
6.8 billion user posts, with very limited amount of categorized keywords, and
was shown to have superior performance over the bag-of-words model. We
successfully deployed gender and interest targeting capability in Yahoo
production systems, delivering inference for users that cover more than 90% of
daily activities at Tumblr. Online performance results indicate advantages of
the proposed approach, where we observed 20% lift in user engagement with
sponsored posts as compared to untargeted campaigns.Comment: 10 pages, 9 figures, Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 2015), Sydney,
Australi
All the Science That Is Fit to Blog: An Analysis of Science Blogging Practices
This dissertation examines science blogging practices, including motivations, routines and content decision rules, across a wide range of science bloggers. Previous research has largely failed to investigate science blogging practices from science bloggers’ perspective or to establish a sociological framework for understanding how science bloggers decide what to blog about. I address this gap in previous research by conducting qualitative in-depth interviews with 50 science bloggers and an extensive survey of blogging motivations, approaches, content decisions rules, values and editorial constraints for over 600 active science bloggers. Results reveal that science blog content is shaped heavily by not only individual factors including personal interest, but also a variety of social forces at levels of routines, organizations or blogging communities, and social institutions. Factors revealed herein to shape science blog content are placed into a sociological framework, an adapted version of Shoemaker and Reese’s Hierarchical Model of Influences, in order to guide current and future research on the sociology of science blogging. Shoemaker and Reese’s Hierarchical Model of Influences is a model of the factors that influence mass media content, which has been used previously by mass communication researchers to guide analysis of mass media content production. In the visual model, concentric circles represent relative hierarchical levels of influences on media content, starting an individuals and expanding out to routines, organizations, extra-media influences and ideology. I adapt this model based on the factors found herein to influence science blog content, such as bloggers’ individual motivations, editorial constraints and access to information sources
- …