34,510 research outputs found
An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors
International audienceIn this paper, we propose a methodology to predict the popularity of online contents. More precisely, rather than trying to infer the popularity of a content itself, we infer the likelihood that a content will be popular. Our approach is rooted in survival analysis where predicting the precise lifetime of an individual is very hard and almost impossible but predicting the likelihood of one's survival longer than a threshold or another individual is possible. We position ourselves in the standpoint of an external observer who has to infer the popularity of a content only using publicly observable metrics, such as the lifetime of a thread, the number of comments, and the number of views. Our goal is to infer these observable metrics, using a set of explanatory factors, such as the number of comments and the number of links in the first hours after the content publication, which are observable by the external observer. We use a Cox proportional hazard regression model that di- vides the distribution function of the observable popularity metric into two components: a) one that can be explained by the given set of explanatory factors (called risk factors) and b) a baseline distribution function that integrates all the factors not taken into account. To validate our proposed approach, we use data sets from two different online discussion forums: dpreview.com, one of the largest online discussion groups providing news and discussion forums about all kinds of digital cameras, and myspace.com, one of the representative online social networking services. On these two data sets we model two different popularity metrics, the lifetime of threads and the number of comments, and show that our approach can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5∼6 days (24 hours, respectively) and the number of comments of Dpreview threads by observing a thread during first 2∼3 days
Modeling and Predicting the Popularity of Online Contents with Cox Proportional Hazard Regression Model
Special Issue on Advances in Web IntelligenceInternational audienceWe propose a general framework which can be used for modeling and predicting the popularity of online contents. The aim of our modeling is not inferring the precise popularity value of a content, but inferring the likelihood where the content will be popular. Our approach is rooted in survival analysis which deals with the survival time until an event of a failure or death. Survival analysis assumes that predicting the precise lifetime of an instance is very hard but predicting the likelihood of the lifetime of an instance is possible based on its hazard distribution. Additionally we position ourselves in the standpoint of an external observer who has to model the popularity of contents only with publicly available information. Thus, the goal of our proposed methodology is to model a certain popularity metric, such as the lifetime of a content and the number of comments which a content receives, with a set of explanatory factors, which are observable by the external observer. Among various parametric and non-parametric approaches for the survival analysis, we use the Cox proportional hazard regression model, which divides the distribution function of a certain popularity metric into two components: one which is explained by a set of explanatory factors, called risk factors, and another, a baseline survival distribution function, which integrates all the factors not taken into account. In order to validate our proposed methodology, we use two datasets crawled from two di erent discussion forums, forum.dpreview.com and forums.myspace.com, which are one of the largest discussion forum dealing various issues on digital cameras and a discussion forum provided by a representative social networks. We model two di erence popularity metrics, the lifetime of threads and the number of comments, and we show that the models can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5 6 days (24 hours, respectively) and the number of comments of Dpreview threads by observing a thread during first 2 3 days
eJournal interface can influence usage statistics: implications for libraries, publishers, and Project COUNTER
The design of a publisher's electronic interface can have a measurable effect
on electronic journal usage statistics. A study of journal usage from six
COUNTER-compliant publishers at thirty-two research institutions in the United
States, the United Kingdom and Sweden indicates that the ratio of PDF to HTML
views is not consistent across publisher interfaces, even after controlling for
differences in publisher content. The number of fulltext downloads may be
artificially inflated when publishers require users to view HTML versions
before accessing PDF versions or when linking mechanisms, such as CrossRef,
direct users to the full text, rather than the abstract, of each article. These
results suggest that usage reports from COUNTER-compliant publishers are not
directly comparable in their current form. One solution may be to modify
publisher numbers with adjustment factors deemed to be representative of the
benefit or disadvantage due to its interface. Standardization of some interface
and linking protocols may obviate these differences and allow for more accurate
cross-publisher comparisons.Comment: 22 pages, 5 figures. JASIST (in press, 2006
The Pulse of News in Social Media: Forecasting Popularity
News articles are extremely time sensitive by nature. There is also intense
competition among news items to propagate as widely as possible. Hence, the
task of predicting the popularity of news items on the social web is both
interesting and challenging. Prior research has dealt with predicting eventual
online popularity based on early popularity. It is most desirable, however, to
predict the popularity of items prior to their release, fostering the
possibility of appropriate decision making to modify an article and the manner
of its publication. In this paper, we construct a multi-dimensional feature
space derived from properties of an article and evaluate the efficacy of these
features to serve as predictors of online popularity. We examine both
regression and classification algorithms and demonstrate that despite
randomness in human behavior, it is possible to predict ranges of popularity on
twitter with an overall 84% accuracy. Our study also serves to illustrate the
differences between traditionally prominent sources and those immensely popular
on the social web
- …