Search CORE

56,278 research outputs found

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Author: A Halavais
A Ishii
A Spoerri
A Spoerri
Attila Szolnoki
B Suh
C Castillo
CA Hidalgo
G Eysenbach
HS Moat
J Bollen
J Ginsberg
J Ratkiewicz
J Török
János Kertész
Márton Mestyán
R Kimmons
R Sharda
RK Pan
S Saavedra
S Sinha
S Sreenivasan
T Brody
T Holloway
T Preis
T Preis
T Yasseri
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
X Shuai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

Oxford University Research Archive

FigShare

A Feature-Based Bayesian Method for Content Popularity Prediction in Edge-Caching Networks

Author: Chatzinotas Symeon
Mehrizi Sajad
Ottersten Bjorn
Tsakmalis Anestis
Publication venue
Publication date: 01/01/2019
Field of study

Edge-caching is recognized as an efficient technique for future wireless cellular networks to improve network capacity and user-perceived quality of experience. Due to the random content requests and the limited cache memory, designing an efficient caching policy is a challenge. To enhance the performance of caching systems, an accurate content request prediction algorithm is essential. Here, we introduce a flexible model, a Poisson regressor based on a Gaussian process, for the content request distribution in stationary environments. Our proposed model can incorporate the content features as side information for prediction enhancement. In order to learn the model parameters, which yield the Poisson rates or alternatively content popularities, we invoke the Bayesian approach which is very robust against over-fitting. However, the posterior distribution in the Bayes formula is analytically intractable to compute. To tackle this issue, we apply a Monte Carlo Markov Chain (MCMC) method to approximate the posterior distribution. Two types of predictive distributions are formulated for the requests of existing contents and for the requests of a newly-added content. Finally, simulation results are provided to confirm the accuracy of the developed content popularity learning approach.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0306

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Luxembourg

Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives

Author: Fafalios Pavlos
Iosifidis Vasileios
Ntoutsi Eirini
Stefanidis Kostas
Publication venue
Publication date: 24/10/2018
Field of study

How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In particular, user-generated content posted in social networks, like Twitter and Facebook, can be seen as a comprehensive documentation of our society, and thus meaningful analysis methods over such archived data are of immense value for sociologists, historians and other interested parties who want to study the history and evolution of entities and events. To this end, in this paper we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities were reflected in social media in different time periods and under different aspects, like popularity, attitude, controversiality, and connectedness with other entities. A case study using a large Twitter archive of four years illustrates the insights that can be gained by such an entity-centric and multi-aspect analysis.Comment: This is a preprint of an article accepted for publication in the International Journal on Digital Libraries (2018

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

A Bayesian Poisson-Gaussian Process Model for Popularity Learning in Edge-Caching Networks

Author: Chatzinotas Symeon
Mehrizi Sajad
Ottersten Bjorn
Tsakmalis Anestis
Publication venue
Publication date: 01/01/2019
Field of study

Edge-caching is recognized as an efficient technique for future cellular networks to improve network capacity and user-perceived quality of experience. To enhance the performance of caching systems, designing an accurate content request prediction algorithm plays an important role. In this paper, we develop a flexible model, a Poisson regressor based on a Gaussian process, for the content request distribution. The first important advantage of the proposed model is that it encourages the already existing or seen contents with similar features to be correlated in the feature space and therefore it acts as a regularizer for the estimation. Second, it allows to predict the popularities of newly-added or unseen contents whose statistical data is not available in advance. In order to learn the model parameters, which yield the Poisson arrival rates or alternatively the content \textit{popularities}, we invoke the Bayesian approach which is robust against over-fitting. However, the resulting posterior distribution is analytically intractable to compute. To tackle this, we apply a Markov Chain Monte Carlo (MCMC) method to approximate this distribution which is also asymptotically exact. Nevertheless, the MCMC is computationally demanding especially when the number of contents is large. Thus, we employ the Variational Bayes (VB) method as an alternative low complexity solution. More specifically, the VB method addresses the approximation of the posterior distribution through an optimization problem. Subsequently, we present a fast block-coordinate descent algorithm to solve this optimization problem. Finally, extensive simulation results both on synthetic and real-world datasets are provided to show the accuracy of our prediction algorithm and the cache hit ratio (CHR) gain compared to existing methods from the literature

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg