Search CORE

1,491 research outputs found

Will This Video Go Viral? Explaining and Predicting the Popularity of Youtube Videos

Author: Kong Quyu
Rizoiu Marian-Andrei
Wu Siqi
Xie Lexing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

What makes content go viral? Which videos become popular and why others don't? Such questions have elicited significant attention from both researchers and industry, particularly in the context of online media. A range of models have been recently proposed to explain and predict popularity; however, there is a short supply of practical tools, accessible for regular users, that leverage these theoretical results. HIPie -- an interactive visualization system -- is created to fill this gap, by enabling users to reason about the virality and the popularity of online videos. It retrieves the metadata and the past popularity series of Youtube videos, it employs Hawkes Intensity Process, a state-of-the-art online popularity model for explaining and predicting video popularity, and it presents videos comparatively in a series of interactive plots. This system will help both content consumers and content producers in a range of data-driven inquiries, such as to comparatively analyze videos and channels, to explain and predict future popularity, to identify viral videos, and to estimate response to online promotion.Comment: 4 page

arXiv.org e-Print Archive

Crossref

Cross-Partisan Discussions on YouTube: Conservatives Talk to Liberals but Liberals Don't Talk to Conservatives

Author: Resnick Paul
Wu Siqi
Publication venue
Publication date: 12/04/2021
Field of study

We present the first large-scale measurement study of cross-partisan discussions between liberals and conservatives on YouTube, based on a dataset of 274,241 political videos from 973 channels of US partisan media and 134M comments from 9.3M users over eight months in 2020. Contrary to a simple narrative of echo chambers, we find a surprising amount of cross-talk: most users with at least 10 comments posted at least once on both left-leaning and right-leaning YouTube channels. Cross-talk, however, was not symmetric. Based on the user leaning predicted by a hierarchical attention model, we find that conservatives were much more likely to comment on left-leaning videos than liberals on right-leaning videos. Secondly, YouTube's comment sorting algorithm made cross-partisan comments modestly less visible; for example, comments from conservatives made up 26.3% of all comments on left-leaning videos but just over 20% of the comments were in the top 20 positions. Lastly, using Perspective API's toxicity score as a measure of quality, we find that conservatives were not significantly more toxic than liberals when users directly commented on the content of videos. However, when users replied to comments from other users, we find that cross-partisan replies were more toxic than co-partisan replies on both left-leaning and right-leaning videos, with cross-partisan replies being especially toxic on the replier's home turf.Comment: Accepted into ICWSM 2021, the code and datasets are publicly available at https://github.com/avalanchesiqi/youtube-crosstal

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

How to Train Your YouTube Recommender to Avoid Unwanted Videos

Author: Liu Alexander
Resnick Paul
Wu Siqi
Publication venue
Publication date: 02/08/2023
Field of study

YouTube provides features for users to indicate disinterest when presented with unwanted recommendations, such as the "Not interested" and "Don't recommend channel" buttons. These buttons are purported to allow the user to correct "mistakes" made by the recommendation system. Yet, relatively little is known about the empirical efficacy of these buttons. Neither is much known about users' awareness of and confidence in them. To address these gaps, we simulated YouTube users with sock puppet agents. Each agent first executed a "stain phase", where it watched many videos of one assigned topic; it then executed a "scrub phase", where it tried to remove recommendations of the assigned topic. Each agent repeatedly applied a single scrubbing strategy, either indicating disinterest in one of the videos visited in the stain phase (disliking it or deleting it from the watch history), or indicating disinterest in a video recommended on the homepage (clicking the "not interested" or "don't recommend channel" button or opening the video and clicking the dislike button). We found that the stain phase significantly increased the fraction of the recommended videos dedicated to the assigned topic on the user's homepage. For the scrub phase, using the "Not interested" button worked best, significantly reducing such recommendations in all topics tested, on average removing 88% of them. Neither the stain phase nor the scrub phase, however, had much effect on videopage recommendations. We also ran a survey (N = 300) asking adult YouTube users in the US whether they were aware of and used these buttons before, as well as how effective they found these buttons to be. We found that 44% of participants were not aware that the "Not interested" button existed. However, those who were aware of this button often used it to remove unwanted recommendations (82.8%) and found it to be modestly effective (3.42 out of 5).Comment: Accepted into ICWSM 2024, the code is publicly available at https://github.com/avliu-um/youtube-disinteres

arXiv.org e-Print Archive

Measuring Collective Attention in Online Content: Sampling, Engagement, and Network Effects

Author: Wu Siqi
Publication venue
Publication date: 01/01/2021
Field of study

The production and consumption of online content have been increasing rapidly, whereas human attention is a scarce resource. Understanding how the content captures collective attention has become a challenge of growing importance. In this thesis, we tackle this challenge from three fronts -- quantifying sampling effects of social media data; measuring engagement behaviors towards online content; and estimating network effects induced by the recommender systems. Data sampling is a fundamental problem. To obtain a list of items, one common method is sampling based on the item prevalence in social media streams. However, social data is often noisy and incomplete, which may affect the subsequent observations. For each item, user behaviors can be conceptualized as two steps -- the first step is relevant to the content appeal, measured by the number of clicks; the second step is relevant to the content quality, measured by the post-clicking metrics, e.g., dwell time, likes, or comments. We categorize online attention (behaviors) into two classes: popularity (clicking) and engagement (watching, liking, or commenting). Moreover, modern platforms use recommender systems to present the users with a tailoring content display for maximizing satisfaction. The recommendation alters the appeal of an item by changing its ranking, and consequently impacts its popularity. Our research is enabled by the data available from the largest video hosting site YouTube. We use YouTube URLs shared on Twitter as a sampling protocol to obtain a collection of videos, and we track their prevalence from 2015 to 2019. This method creates a longitudinal dataset consisting of more than 5 billion tweets. Albeit the volume is substantial, we find Twitter still subsamples the data. Our dataset covers about 80% of all tweets with YouTube URLs. We present a comprehensive measurement study of the Twitter sampling effects across different timescales and different subjects. We find that the volume of missing tweets can be estimated by Twitter rate limit messages, true entity ranking can be inferred based on sampled observations, and sampling compromises the quality of network and diffusion models. Next, we present the first large-scale measurement study of how users collectively engage with YouTube videos. We study the time and percentage of each video being watched. We propose a duration-calibrated metric, called relative engagement, which is correlated with recognized notion of content quality, stable over time, and predictable even before a video's upload. Lastly, we examine the network effects induced by the YouTube recommender system. We construct the recommendation network for 60,740 music videos from 4,435 professional artists. An edge indicates that the target video is recommended on the webpage of source video. We discover the popularity bias -- videos are disproportionately recommended towards more popular videos. We use the bow-tie structure to characterize the network and find that the largest strongly connected component consists of 23.1% of videos while occupying 82.6% of attention. We also build models to estimate the latent influence between videos and artists. By taking into account the network structure, we can predict video popularity 9.7% better than other baselines. Altogether, we explore the collective consuming patterns of human attention towards online content. Methods and findings from this thesis can be used by content producers, hosting sites, and online users alike to improve content production, advertising strategies, and recommender systems. We expect our new metrics, methods, and observations can generalize to other multimedia platforms such as the music streaming service Spotify

The Australian National University

NPRL2: A New Target In Breast Cancer Treatment

Author: Peng Guang
Ruiz Ana-Sofia
Wu Siqi, PhD
Publication venue: OpenWorks @ MD Anderson
Publication date: 06/08/2022
Field of study

https://openworks.mdanderson.org/sumexp22/1063/thumbnail.jp

OpenWorks @ MD Anderson

The birth of edge cities in China: measuring the spillover effects of industrial parks

Author: Kahn Matthew
Sun Weizeng
Wu Jianfeng
Zheng Siqi
Publication venue: International Growth Centre
Publication date: 01/07/2015
Field of study

From its established status as a high-tech science park in 1988, Zhongguancun has been transformed from a village to China’s “Silicon Valley”. Zhongguancun’s big success has led many Chinese local governments to embrace ‘place-based’ investments and support the building of industrial parks (special economic zones, SEZ). In fact, this is a growing global trend. A recent Economist article, reported that there are more than 4,000 SEZs (industrial parks) around the world, ranging from basic export processing zones and science parks to more high-tech economic zones

Crossref

LSE Research Online