5,141 research outputs found

    The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity

    Full text link
    Video dissemination through sites such as YouTube can have widespread impacts on opinions, thoughts, and cultures. Not all videos will reach the same popularity and have the same impact. Popularity differences arise not only because of differences in video content, but also because of other "content-agnostic" factors. The latter factors are of considerable interest but it has been difficult to accurately study them. For example, videos uploaded by users with large social networks may tend to be more popular because they tend to have more interesting content, not because social network size has a substantial direct impact on popularity. In this paper, we develop and apply a methodology that is able to accurately assess, both qualitatively and quantitatively, the impacts of various content-agnostic factors on video popularity. When controlling for video content, we observe a strong linear "rich-get-richer" behavior, with the total number of previous views as the most important factor except for very young videos. The second most important factor is found to be video age. We analyze a number of phenomena that may contribute to rich-get-richer, including the first-mover advantage, and search bias towards popular videos. For young videos we find that factors other than the total number of previous views, such as uploader characteristics and number of keywords, become relatively more important. Our findings also confirm that inaccurate conclusions can be reached when not controlling for content.Comment: Dataset available at: http://www.ida.liu.se/~nikca/papers/kdd12.htm

    Popularity Characterization and Modelling for User-generated Videos

    Get PDF
    User-generated content systems such as YouTube have become highly popular. It is difficult to under- stand and predict content popularity in such systems. Characterizing and modelling content popularity can provide deeper insights into system design trade-offs and enable prediction of system behaviour in advance. Borghol et al. collected two datasets of YouTube video weekly view counts over eight months in 2008/09, namely a “recently-uploaded” dataset and a “keyword-search” dataset, and analyzed the popular- ity characteristics of the videos in the recently-uploaded dataset including the video popularity evolution over time. Based on the observed characteristics, they developed a model that can generate synthetic video weekly view counts whose characteristics with respect to video popularity evolution match those observed in the recently-uploaded dataset. For this thesis, new weekly view count data was collected over two months in 2011 for the videos in the recently-uploaded and keyword-search datasets of Borghol et al. This data was used to evaluate the accuracy of the Borghol et al. model when used to generate synthetic view counts for a much longer time period than the eight month period previously considered. Although the model yielded distributions of total (lifetime) video view counts that match the empirical distributions, significant differences between the model and em- pirical data were observed. These differences appear to arise because of particular popularity characteristics that change over time rather than being week-invariant as assumed in the model. This thesis also characterizes how video popularity evolves beyond the eight month period considered by Borghol et al., and studies the characteristics of the keyword-search dataset with respect to content popu- larity, popularity evolution, and sampling biases. Finally, the thesis studies the popularity characteristics of the videos in the recently-uploaded and keyword-search datasets for which additional view count data could not be collected, owing to the removal of these videos from YouTube

    A Vocabulary for Growth: Topic Modeling of Content Popularity Evolution

    Get PDF
    In this paper, we present a novel method to predict the long-term popularity of user-generated content (UGC). At first, the method clusters the dynamics of UGC popularity into a vocabulary of growth in popularity (sequence) by using a mixture model. Eventually, the method assigns to each sequence a topic model to describe the dynamics of the sequence in a compact way. We then use this topic model to identify similar patterns of growth in popularity of newly observed UGC. The proposed method has two key features: First, it considers the historical dynamics of the UGC popularity, and second it provides long-term popularity prediction. Results on the real dataset of UGC show that the proposed method is flexible, and able to accurately forecast the complete growth in popularity of a given UGC

    Temporal Locality in Today's Content Caching: Why it Matters and How to Model it

    Get PDF
    The dimensioning of caching systems represents a difficult task in the design of infrastructures for content distribution in the current Internet. This paper addresses the problem of defining a realistic arrival process for the content requests generated by users, due its critical importance for both analytical and simulative evaluations of the performance of caching systems. First, with the aid of YouTube traces collected inside operational residential networks, we identify the characteristics of real traffic that need to be considered or can be safely neglected in order to accurately predict the performance of a cache. Second, we propose a new parsimonious traffic model, named the Shot Noise Model (SNM), that enables users to natively capture the dynamics of content popularity, whilst still being sufficiently simple to be employed effectively for both analytical and scalable simulative studies of caching systems. Finally, our results show that the SNM presents a much better solution to account for the temporal locality observed in real traffic compared to existing approaches.Comment: 7 pages, 7 figures, Accepted for publication in ACM Computer Communication Revie
    • …
    corecore