3,651 research outputs found
Temporal characterization of the requests to Wikipedia
This paper presents an empirical study about the temporal patterns
characterizing the requests submitted by users to Wikipedia.
The study is based on the analysis of the log lines registered by the
Wikimedia Foundation Squid servers after having sent the appropriate
content in response to users' requests. The
analysis has been conducted regarding the ten most visited editions of
Wikipedia and has involved more than 14,000 million log lines
corresponding to the traffic of the entire year 2009. The conducted methodology
has mainly consisted in the parsing and filtering
of users' requests according to the study directives. As a result, relevant information
fields have been finally stored in a database for persistence and further
characterization. In this way, we, first, assessed, whether the traffic to Wikipedia could serve
as a reliable estimator of the overall traffic to all the Wikimedia Foundation
projects. Our subsequent analysis of the temporal evolutions corresponding to
the different types of requests to Wikipedia revealed interesting differences
and similarities among them that can be related to the users' attention to the Encyclopedia.
In addition, we have performed separated characterizations of each Wikipedia edition
to compare their respective evolutions over time
Why We Read Wikipedia
Wikipedia is one of the most popular sites on the Web, with millions of users
relying on it to satisfy a broad range of information needs every day. Although
it is crucial to understand what exactly these needs are in order to be able to
meet them, little is currently known about why users visit Wikipedia. The goal
of this paper is to fill this gap by combining a survey of Wikipedia readers
with a log-based analysis of user activity. Based on an initial series of user
surveys, we build a taxonomy of Wikipedia use cases along several dimensions,
capturing users' motivations to visit Wikipedia, the depth of knowledge they
are seeking, and their knowledge of the topic of interest prior to visiting
Wikipedia. Then, we quantify the prevalence of these use cases via a
large-scale user survey conducted on live Wikipedia with almost 30,000
responses. Our analyses highlight the variety of factors driving users to
Wikipedia, such as current events, media coverage of a topic, personal
curiosity, work or school assignments, or boredom. Finally, we match survey
responses to the respondents' digital traces in Wikipedia's server logs,
enabling the discovery of behavioral patterns associated with specific use
cases. For instance, we observe long and fast-paced page sequences across
topics for users who are bored or exploring randomly, whereas those using
Wikipedia for work or school spend more time on individual articles focused on
topics such as science. Our findings advance our understanding of reader
motivations and behavior on Wikipedia and can have implications for developers
aiming to improve Wikipedia's user experience, editors striving to cater to
their readers' needs, third-party services (such as search engines) providing
access to Wikipedia content, and researchers aiming to build tools such as
recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table
The DIGMAP geo-temporal web gazetteer service
This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval
Search strategies of Wikipedia readers
The quest for information is one of the most common activity of human beings. Despite the the impressive progress of search engines, not to miss the needed piece of information could be still very tough, as well as to acquire specific competences and knowledge by shaping and following the proper learning paths. Indeed, the need to find sensible paths in information networks is one of the biggest challenges of our societies and, to effectively address it, it is important to investigate the strategies adopted by human users to cope with the cognitive bottleneck of finding their way in a growing sea of information. Here we focus on the case of Wikipedia and investigate a recently released dataset about users’ click on the English Wikipedia, namely the English Wikipedia Clickstream. We perform a semantically charged analysis to uncover the general patterns followed by information seekers in the multi-dimensional space of Wikipedia topics/categories. We discover the existence of well defined strategies in which users tend to start from very general, i.e., semantically broad, pages and progressively narrow down the scope of their navigation, while keeping a growing semantic coherence. This is unlike strategies associated to tasks with predefined search goals, namely the case of the Wikispeedia game. In this case users first move from the ‘particular’ to the ‘universal’ before focusing down again to the required target. The clear picture offered here represents a very important stepping stone towards a better design of information networks and recommendation strategies, as well as the construction of radically new learning paths
Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data
Use of socially generated "big data" to access information about collective
states of the minds in human societies has become a new paradigm in the
emerging field of computational social science. A natural application of this
would be the prediction of the society's reaction to a new product in the sense
of popularity and adoption rate. However, bridging the gap between "real time
monitoring" and "early predicting" remains a big challenge. Here we report on
an endeavor to build a minimalistic predictive model for the financial success
of movies based on collective activity data of online users. We show that the
popularity of a movie can be predicted much before its release by measuring and
analyzing the activity level of editors and viewers of the corresponding entry
to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the
dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi
Characterization of the Wikipedia Traffic
Since its inception, Wikipedia has grown to a solid and stable project and turned into a mass collaboration tool that allows the sharing and distribution of knowledge. The wiki approach that basis this initiative promotes the participation and collaboration of users. In addition to visits for browsing its contents, Wikipedia also receives the contributions of users to improve them. In the past, researchers paid attention to different aspects concerning authoring and quality of contents. However, little effort has been made to study the nature of the visits that Wikipedia receives. We conduct such an study using a sample of users' requests provided by the Wikimedia Foundation in the form of Squid log lines. Our sample contains more that 14,000 million requests from users all around the world and directed to all the projects maintained by the Wikimedia Foundation, including different editions of Wikipedia. This papers describes the work made to characterize the traffic directed to Wikipedia and consisting of the requests sent by its users. Our main aim is to obtain a detailed description of its composition in terms of the percentages corresponding to the different types of requests making part of it. The benefits from our work may range from the prediction of traffic peaks to the determination of the kind of resources most often requested, which can be useful for scalability considerations
- …