Search CORE

51,917 research outputs found

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Author: A Halavais
A Ishii
A Spoerri
A Spoerri
Attila Szolnoki
B Suh
C Castillo
CA Hidalgo
G Eysenbach
HS Moat
J Bollen
J Ginsberg
J Ratkiewicz
J Török
János Kertész
Márton Mestyán
R Kimmons
R Sharda
RK Pan
S Saavedra
S Sinha
S Sreenivasan
T Brody
T Holloway
T Preis
T Preis
T Yasseri
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
X Shuai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

arXiv.org e-Print Archive

Directory of Open Access Journals

Oxford University Research Archive

FigShare

Validation issues in educational data mining:the case of HTML-Tutor and iHelp

Author: Cocea Mihaela
Weibelzahl Stephan
Publication venue: CRC Press Inc
Publication date: 25/10/2010
Field of study

Is the Web ready for HTTP/2 Server Push?

Author: Borgnat P.
Butkiewicz Michael
Kelton Conor
Netravali Ravi
Wang Xiao Sophia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

HTTP/2 supersedes HTTP/1.1 to tackle the performance challenges of the modern Web. A highly anticipated feature is Server Push, enabling servers to send data without explicit client requests, thus potentially saving time. Although guidelines on how to use Server Push emerged, measurements have shown that it can easily be used in a suboptimal way and hurt instead of improving performance. We thus tackle the question if the current Web can make better use of Server Push. First, we enable real-world websites to be replayed in a testbed to study the effects of different Server Push strategies. Using this, we next revisit proposed guidelines to grasp their performance impact. Finally, based on our results, we propose a novel strategy using an alternative server scheduler that enables to interleave resources. This improves the visual progress for some websites, with minor modifications to the deployment. Still, our results highlight the limits of Server Push: a deep understanding of web engineering is required to make optimal use of it, and not every site will benefit.Comment: More information available at https://push.netray.i

arXiv.org e-Print Archive

XRay: Enhancing the Web's Transparency with Differential Correlation

Author: Chaintreau Augustin
Ducoffe Guillaume
Geambasu Roxana
Lan Francis
Lecuyer Mathias
Papancea Andrei
Petsios Theofilos
Spahn Riley
Publication venue
Publication date: 20/08/2014
Field of study

Today's Web services - such as Google, Amazon, and Facebook - leverage user data for varied purposes, including personalizing recommendations, targeting advertisements, and adjusting prices. At present, users have little insight into how their data is being used. Hence, they cannot make informed choices about the services they choose. To increase transparency, we developed XRay, the first fine-grained, robust, and scalable personal data tracking system for the Web. XRay predicts which data in an arbitrary Web account (such as emails, searches, or viewed products) is being used to target which outputs (such as ads, recommended products, or prices). XRay's core functions are service agnostic and easy to instantiate for new services, and they can track data within and across services. To make predictions independent of the audited service, XRay relies on the following insight: by comparing outputs from different accounts with similar, but not identical, subsets of data, one can pinpoint targeting through correlation. We show both theoretically, and through experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision and recall by correlating data from a surprisingly small number of extra accounts.Comment: Extended version of a paper presented at the 23rd USENIX Security Symposium (USENIX Security 14

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Normalized Web Distance and Word Similarity

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2009
Field of study

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592

arXiv.org e-Print Archive

CiteSeerX

Development of a land use regression model for black carbon using mobile monitoring data and its application to pollution-avoiding routing

Author: De Baets Bernard
Theunis Jan
Van den Bossche Joris
Van den Hove Annelies
Verwaeren Jan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Black carbon is often used as an indicator for combustion-related air pollution. In urban environments, on-road black carbon concentrations have a large spatial variability, suggesting that the personal exposure of a cyclist to black carbon can heavily depend on the route that is chosen to reach a destination. In this paper, we describe the development of a cyclist routing procedure that minimizes personal exposure to black carbon. Firstly, a land use regression model for predicting black carbon concentrations in an urban environment is developed using mobile monitoring data, collected by cyclists. The optimal model is selected and validated using a spatially stratified cross-validation scheme. The resulting model is integrated in a dedicated routing procedure that minimizes personal exposure to black carbon during cycling. The best model obtains a coefficient of multiple correlation of R = 0.520. Simulations with the black carbon exposure minimizing routing procedure indicate that the inhaled amount of black carbon is reduced by 1.58% on average as compared to the shortest-path route, with extreme cases where a reduction of up to 13.35% is obtained. Moreover, we observed that the average exposure to black carbon and the exposure to local peak concentrations on a route are competing objectives, and propose a parametrized cost function for the routing problem that allows for a gradual transition from routes that minimize average exposure to routes that minimize peak exposure