34,510 research outputs found

    An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors

    Get PDF
    International audienceIn this paper, we propose a methodology to predict the popularity of online contents. More precisely, rather than trying to infer the popularity of a content itself, we infer the likelihood that a content will be popular. Our approach is rooted in survival analysis where predicting the precise lifetime of an individual is very hard and almost impossible but predicting the likelihood of one's survival longer than a threshold or another individual is possible. We position ourselves in the standpoint of an external observer who has to infer the popularity of a content only using publicly observable metrics, such as the lifetime of a thread, the number of comments, and the number of views. Our goal is to infer these observable metrics, using a set of explanatory factors, such as the number of comments and the number of links in the first hours after the content publication, which are observable by the external observer. We use a Cox proportional hazard regression model that di- vides the distribution function of the observable popularity metric into two components: a) one that can be explained by the given set of explanatory factors (called risk factors) and b) a baseline distribution function that integrates all the factors not taken into account. To validate our proposed approach, we use data sets from two different online discussion forums: dpreview.com, one of the largest online discussion groups providing news and discussion forums about all kinds of digital cameras, and myspace.com, one of the representative online social networking services. On these two data sets we model two different popularity metrics, the lifetime of threads and the number of comments, and show that our approach can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5∼6 days (24 hours, respectively) and the number of comments of Dpreview threads by observing a thread during first 2∼3 days

    Modeling and Predicting the Popularity of Online Contents with Cox Proportional Hazard Regression Model

    Get PDF
    Special Issue on Advances in Web IntelligenceInternational audienceWe propose a general framework which can be used for modeling and predicting the popularity of online contents. The aim of our modeling is not inferring the precise popularity value of a content, but inferring the likelihood where the content will be popular. Our approach is rooted in survival analysis which deals with the survival time until an event of a failure or death. Survival analysis assumes that predicting the precise lifetime of an instance is very hard but predicting the likelihood of the lifetime of an instance is possible based on its hazard distribution. Additionally we position ourselves in the standpoint of an external observer who has to model the popularity of contents only with publicly available information. Thus, the goal of our proposed methodology is to model a certain popularity metric, such as the lifetime of a content and the number of comments which a content receives, with a set of explanatory factors, which are observable by the external observer. Among various parametric and non-parametric approaches for the survival analysis, we use the Cox proportional hazard regression model, which divides the distribution function of a certain popularity metric into two components: one which is explained by a set of explanatory factors, called risk factors, and another, a baseline survival distribution function, which integrates all the factors not taken into account. In order to validate our proposed methodology, we use two datasets crawled from two di erent discussion forums, forum.dpreview.com and forums.myspace.com, which are one of the largest discussion forum dealing various issues on digital cameras and a discussion forum provided by a representative social networks. We model two di erence popularity metrics, the lifetime of threads and the number of comments, and we show that the models can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5 6 days (24 hours, respectively) and the number of comments of Dpreview threads by observing a thread during first 2 3 days

    eJournal interface can influence usage statistics: implications for libraries, publishers, and Project COUNTER

    Full text link
    The design of a publisher's electronic interface can have a measurable effect on electronic journal usage statistics. A study of journal usage from six COUNTER-compliant publishers at thirty-two research institutions in the United States, the United Kingdom and Sweden indicates that the ratio of PDF to HTML views is not consistent across publisher interfaces, even after controlling for differences in publisher content. The number of fulltext downloads may be artificially inflated when publishers require users to view HTML versions before accessing PDF versions or when linking mechanisms, such as CrossRef, direct users to the full text, rather than the abstract, of each article. These results suggest that usage reports from COUNTER-compliant publishers are not directly comparable in their current form. One solution may be to modify publisher numbers with adjustment factors deemed to be representative of the benefit or disadvantage due to its interface. Standardization of some interface and linking protocols may obviate these differences and allow for more accurate cross-publisher comparisons.Comment: 22 pages, 5 figures. JASIST (in press, 2006

    The Pulse of News in Social Media: Forecasting Popularity

    Full text link
    News articles are extremely time sensitive by nature. There is also intense competition among news items to propagate as widely as possible. Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging. Prior research has dealt with predicting eventual online popularity based on early popularity. It is most desirable, however, to predict the popularity of items prior to their release, fostering the possibility of appropriate decision making to modify an article and the manner of its publication. In this paper, we construct a multi-dimensional feature space derived from properties of an article and evaluate the efficacy of these features to serve as predictors of online popularity. We examine both regression and classification algorithms and demonstrate that despite randomness in human behavior, it is possible to predict ranges of popularity on twitter with an overall 84% accuracy. Our study also serves to illustrate the differences between traditionally prominent sources and those immensely popular on the social web
    corecore