10 research outputs found

    Estimating attention flow in online video networks

    Full text link
    © 2019 Association for Computing Machinery. Online videos have shown tremendous increase in Internet traffic. Most video hosting sites implement recommender systems, which connect the videos into a directed network and conceptually act as a source of pathways for users to navigate. At present, little is known about how human attention is allocated over such large-scale networks, and about the impacts of the recommender systems. In this paper, we first construct the Vevo network — a YouTube video network with 60,740 music videos interconnected by the recommendation links, and we collect their associated viewing dynamics. This results in a total of 310 million views every day over a period of 9 weeks. Next, we present large-scale measurements that connect the structure of the recommendation network and the video attention dynamics. We use the bow-tie structure to characterize the Vevo network and we find that its core component (23.1% of the videos), which occupies most of the attention (82.6% of the views), is made out of videos that are mainly recommended among themselves. This is indicative of the links between video recommendation and the inequality of attention allocation. Finally, we address the task of estimating the attention flow in the video recommendation network. We propose a model that accounts for the network effects for predicting video popularity, and we show it consistently outperforms the baselines. This model also identifies a group of artists gaining attention because of the recommendation network. Altogether, our observations and our models provide a new set of tools to better understand the impacts of recommender systems on collective social attention

    Local Information Diffusion Patterns in Social and Traditional Media: The Estonian Case Study

    Get PDF
    Paljud ettevõtted ja inimesed hindavad kõrgelt informatsiooni väärtust ja seda on eelkõige hakatud hindama viimase kümnekonna aasta jooksul. Tänu sellele on tekkinud ka huvi, kuidas info levib erinevates struktureeritud võrgustikes. Avaldatud on mitmeid teadustöid, mis uurivad informatsiooni levimist ühes reaalse elu võrgustikus nagu näiteks Facebooki postitused, Twitteri tweetid, Blogspoti blogikanded jne. Suuresti on need uurimused keskendunud ühele võrgustikule, mis ei hõlma kogu võrgu dünaamikat ja samuti välist mõju info levimisele. Samas on lähiminevikus avaldatud ka teadustöid, mis hõlmavad mitut erinevat võrgustiku ja analüüsivad välist mõju informatsiooni levimisele. Käesoleva töö eesmärk on lähemalt uurida informatsiooni levimise mustreid võrgustikus, mis hõlmab erinevaid reaalelu võrgustike, kasutades selleks topoloogilisi ja aja mustreid. Topoloogiliste mustrite analüüsimiseks on kasutatud võrgustikus sagedalt levivate alamgraafide leidmise algoritme, aja mustreid uuritakse ajaseeriate klasterdamise teel. Töös kasutatud andmestik on kogutud Eesti uudismeediast - artiklid ja nende kommentaarid ning sotsiaalmeedia kanalitest, Twitterist ja Facebook-ist. Selle andmestiku põhjal loodi seosed eritüüpi andmeobjektide vahel, mille põhjal loodi võrgustik, mida kasutada edasiseks uurimiseks. Aja mustrid viitavad väga kiirele info levimisele antud võrgustikus, topoloogilised mustrid näitavad uudismeedia artiklite ja Facebook-i postituste suurt mõju info levimises. Töö tulemusi on võimalik rakendada küberkaitses, online turunduses ja kampaania haldamises, samuti ka mõjuvõimu hindamisel - kindlasti leiaks tulemused rakendust ka teistes valdkondades.Information has become more highly valued among companies and individuals than ever before. With this, the interest in how information diffuses among the entities in various structured networks has increased. A number of studies have been published on the diffusion process in real-life networks, such as web service network, citation networks, blog networks etc. Majority of researches have focused on one type of network - such as Facebook posts, Twitter tweets, Blogspot blog entries etc. A disadvantage of analysing a network containing entities from a single source is that it does not consider the outside influence on the diffusion. Recently, some papers have started to incorporate different networks in their study and as such have been able to analyse the effect of outside influence on the diffusion process. This thesis aims to shed further light into the topic of information diffusion in a real world network containing entities from different sources, this is achieved by detection of relevant local topological and temporal information diffusion patterns. For topological pattern analysis, frequent subgraph mining techniques are used. Temporal patterns are extracted using time series clustering. The dataset used in this thesis is collected from the Estonian setting of mainstream online news media with comments and articles and from social media channels Twitter and Facebook. From this dataset the relations between the entities were extracted and a network for analysis of diffusion patterns was constructed. Temporal patterns reveal the high pace of information diffusion while topological patterns expose the important role of news media articles and Facebook posts in the information diffusion processes. The results of the thesis are applicable in cyber defence, online marketing and campaign management plus information impact estimation, just to mention a few application areas

    Dynamics of Information Diffusion

    No full text
    Real diffusion networks are complex and dynamic, since underlying social structures are not only far-reaching beyond a single homogeneous system but also frequently changing with the context of diffusion. Thus, studying topic-related diffusion across multiple social systems is important for a better understanding of such realistic situations. Accordingly, this thesis focuses on uncovering topic-related diffusion dynamics across heterogeneous social networks in both model-driven and model-free ways. We first conduct empirical studies for analyzing diffusion phenomena in real world systems, such as new diffusion in social media and knowledge transfer in academic publications. We observe that large diffusion is more likely attributed to interactions between heterogeneous social networks as if they were in the same networks. Thus, external influences from out-of-the-network sources, as observed in previous work, need to be explained with the context of interactions between heterogeneous social networks. This observation motivates our new conceptual framework for cross-population diffusion, which extends the traditional diffusion mechanism to a more flexible and general one. Second, we propose both model-driven and model-free approaches to estimate global trends of information diffusion. Based on our conceptual framework, we propose a model-driven approach which allows internal influence to reach heterogeneous populations in a probabilistic way. This approach extends a simple and robust mass action diffusion model by incorporating the structural connectivity and heterogeneity of real-world networks. We then propose a model-free approach using informationtheoretic measures with the consideration of both time-delay and memory effects on diffusion. In contrast to the model-driven approach, this model-free approach does not require any assumptions on dynamic social interactions in the real world, providing the benefits of quantifying nonlinear dynamics of complex systems. Finally, we compare our model-driven and model-free approaches in accordance with different context of diffusion. This helps us to obtain a more comprehensive understanding of topic-related diffusion patterns. Both approaches provide a coherent macroscopic view of global diffusion in terms of the strength and directionality of influences among heterogeneous social networks. We find that the two approaches provide similar results but with different perspectives, which in conjunction can help better explain diffusion than either approach alone. They also suggest alternative options as either or both of the approaches can be used appropriate to the real situations of different application domains. We expect that our proposed approaches provide ways to quantify and understand cross-population diffusion trends at a macro level. Also, they can be applied to a wide range of research areas such as social science, marketing, and even neuroscience, for estimating dynamic influences among target regions or systems

    Measuring Collective Attention in Online Content: Sampling, Engagement, and Network Effects

    Get PDF
    The production and consumption of online content have been increasing rapidly, whereas human attention is a scarce resource. Understanding how the content captures collective attention has become a challenge of growing importance. In this thesis, we tackle this challenge from three fronts -- quantifying sampling effects of social media data; measuring engagement behaviors towards online content; and estimating network effects induced by the recommender systems. Data sampling is a fundamental problem. To obtain a list of items, one common method is sampling based on the item prevalence in social media streams. However, social data is often noisy and incomplete, which may affect the subsequent observations. For each item, user behaviors can be conceptualized as two steps -- the first step is relevant to the content appeal, measured by the number of clicks; the second step is relevant to the content quality, measured by the post-clicking metrics, e.g., dwell time, likes, or comments. We categorize online attention (behaviors) into two classes: popularity (clicking) and engagement (watching, liking, or commenting). Moreover, modern platforms use recommender systems to present the users with a tailoring content display for maximizing satisfaction. The recommendation alters the appeal of an item by changing its ranking, and consequently impacts its popularity. Our research is enabled by the data available from the largest video hosting site YouTube. We use YouTube URLs shared on Twitter as a sampling protocol to obtain a collection of videos, and we track their prevalence from 2015 to 2019. This method creates a longitudinal dataset consisting of more than 5 billion tweets. Albeit the volume is substantial, we find Twitter still subsamples the data. Our dataset covers about 80% of all tweets with YouTube URLs. We present a comprehensive measurement study of the Twitter sampling effects across different timescales and different subjects. We find that the volume of missing tweets can be estimated by Twitter rate limit messages, true entity ranking can be inferred based on sampled observations, and sampling compromises the quality of network and diffusion models. Next, we present the first large-scale measurement study of how users collectively engage with YouTube videos. We study the time and percentage of each video being watched. We propose a duration-calibrated metric, called relative engagement, which is correlated with recognized notion of content quality, stable over time, and predictable even before a video's upload. Lastly, we examine the network effects induced by the YouTube recommender system. We construct the recommendation network for 60,740 music videos from 4,435 professional artists. An edge indicates that the target video is recommended on the webpage of source video. We discover the popularity bias -- videos are disproportionately recommended towards more popular videos. We use the bow-tie structure to characterize the network and find that the largest strongly connected component consists of 23.1% of videos while occupying 82.6% of attention. We also build models to estimate the latent influence between videos and artists. By taking into account the network structure, we can predict video popularity 9.7% better than other baselines. Altogether, we explore the collective consuming patterns of human attention towards online content. Methods and findings from this thesis can be used by content producers, hosting sites, and online users alike to improve content production, advertising strategies, and recommender systems. We expect our new metrics, methods, and observations can generalize to other multimedia platforms such as the music streaming service Spotify

    News vertical search using user-generated content

    Get PDF
    The thesis investigates how content produced by end-users on the World Wide Web — referred to as user-generated content — can enhance the news vertical aspect of a universal Web search engine, such that news-related queries can be satisfied more accurately, comprehensively and in a more timely manner. We propose a news search framework to describe the news vertical aspect of a universal web search engine. This framework is comprised of four components, each providing a different piece of functionality. The Top Events Identification component identifies the most important events that are happening at any given moment using discussion in user-generated content streams. The News Query Classification component classifies incoming queries as news-related or not in real-time. The Ranking News-Related Content component finds and ranks relevant content for news-related user queries from multiple streams of news and user-generated content. Finally, the News-Related Content Integration component merges the previously ranked content for the user query into theWeb search ranking. In this thesis, we argue that user-generated content can be leveraged in one or more of these components to better satisfy news-related user queries. Potential enhancements include the faster identification of news queries relating to breaking news events, more accurate classification of news-related queries, increased coverage of the events searched for by the user or increased freshness in the results returned. Approaches to tackle each of the four components of the news search framework are proposed, which aim to leverage user-generated content. Together, these approaches form the news vertical component of a universal Web search engine. Each approach proposed for a component is thoroughly evaluated using one or more datasets developed for that component. Conclusions are derived concerning whether the use of user-generated content enhances the component in question using an appropriate measure, namely: effectiveness when ranking events by their current importance/newsworthiness for the Top Events Identification component; classification accuracy over different types of query for the News Query Classification component; relevance of the documents returned for the Ranking News-Related Content component; and end-user preference for rankings integrating user-generated content in comparison to the unalteredWeb search ranking for the News-Related Content Integration component. Analysis of the proposed approaches themselves, the effective settings for the deployment of those approaches and insights into their behaviour are also discussed. In particular, the evaluation of the Top Events Identification component examines how effectively events — represented by newswire articles — can be ranked by their importance using two different streams of user-generated content, namely blog posts and Twitter tweets. Evaluation of the proposed approaches for this component indicates that blog posts are an effective source of evidence to use when ranking events and that these approaches achieve state-of-the-art effectiveness. Using the same approaches instead driven by a stream of tweets, provide a story ranking performance that is significantly more effective than random, but is not consistent across all of the datasets and approaches tested. Insights are provided into the reasons for this with regard to the transient nature of discussion in Twitter. Through the evaluation of the News Query Classification component, we show that the use of timely features extracted from different news and user-generated content sources can increase the accuracy of news query classification over relying upon newswire provider streams alone. Evidence also suggests that the usefulness of the user-generated content sources varies as news events mature, with some sources becoming more influential over time as new content is published, leading to an upward trend in classification accuracy. The Ranking News-Related Content component evaluation investigates how to effectively rank content from the blogosphere and Twitter for news-related user queries. Of the approaches tested, we show that learning to rank approaches using features specific to blog posts/tweets lead to state-of-the-art ranking effectiveness under real-time constraints. Finally this thesis demonstrates that the majority of end-users prefer rankings integrated with usergenerated content for news-related queries to rankings containing only Web search results or integrated with only newswire articles. Of the user-generated content sources tested, the most popular source is shown to be Twitter, particularly for queries relating to breaking events. The central contributions of this thesis are the introduction of a news search framework, the approaches to tackle each of the four components of the framework that integrate user-generated content and their subsequent evaluation in a simulated real-time setting. This thesis draws insights from a broad range of experiments spanning the entire search process for news-related queries. The experiments reported in this thesis demonstrate the potential and scope for enhancements that can be brought about by the leverage of user-generated content for real-time news search and related applications

    News vertical search using user-generated content

    Full text link

    Event Diffusion Patterns in Social Media

    No full text
    This study focuses on real-world events and their reflections on the continuous stream of online discussions. Studying event diffusion on social media is important, as this will tell us how a significant event (such as a natural disaster) spreads and evo

    Event Diffusion Patterns in Social Media

    No full text
    This study focuses on real-world events and their reflections on the continuous stream of online discussions. Studying event diffusion on social media is important, as this will tell us how a significant event (such as a natural disaster) spreads and evolves interacting with other events, and who has helped spreading the event. Tracking an ever-changing list of often unanticipated events is difficult, and most prior work has focused on specific event derivatives such as quotes or user-generated tags. In this paper, we propose a method for identifying real-world events on social media, and present observations about event diffusion patterns across diverse media types such as news, blogs, and social networking sites. We first construct an event registry based on the Wikipedia portal of global news events, and we represent each real-world event with entities that embody the 5W1H (e.g., organization, person name, place) used in news coverage. We then label each web document with the list of identified events based on entity similarity between them. We analyze the ICWSM’11 Spinn3r dataset containing over 60 million English documents. We observe surprising connections among the 161 events it covers, and that over half (55%) of users only link to a small fraction of prolific users (1%), a notable departure from the balanced traditional bow-tie model of web content.
    corecore