3,651 research outputs found

    Temporal characterization of the requests to Wikipedia

    Get PDF
    This paper presents an empirical study about the temporal patterns characterizing the requests submitted by users to Wikipedia. The study is based on the analysis of the log lines registered by the Wikimedia Foundation Squid servers after having sent the appropriate content in response to users' requests. The analysis has been conducted regarding the ten most visited editions of Wikipedia and has involved more than 14,000 million log lines corresponding to the traffic of the entire year 2009. The conducted methodology has mainly consisted in the parsing and filtering of users' requests according to the study directives. As a result, relevant information fields have been finally stored in a database for persistence and further characterization. In this way, we, first, assessed, whether the traffic to Wikipedia could serve as a reliable estimator of the overall traffic to all the Wikimedia Foundation projects. Our subsequent analysis of the temporal evolutions corresponding to the different types of requests to Wikipedia revealed interesting differences and similarities among them that can be related to the users' attention to the Encyclopedia. In addition, we have performed separated characterizations of each Wikipedia edition to compare their respective evolutions over time

    Why We Read Wikipedia

    Get PDF
    Wikipedia is one of the most popular sites on the Web, with millions of users relying on it to satisfy a broad range of information needs every day. Although it is crucial to understand what exactly these needs are in order to be able to meet them, little is currently known about why users visit Wikipedia. The goal of this paper is to fill this gap by combining a survey of Wikipedia readers with a log-based analysis of user activity. Based on an initial series of user surveys, we build a taxonomy of Wikipedia use cases along several dimensions, capturing users' motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia. Then, we quantify the prevalence of these use cases via a large-scale user survey conducted on live Wikipedia with almost 30,000 responses. Our analyses highlight the variety of factors driving users to Wikipedia, such as current events, media coverage of a topic, personal curiosity, work or school assignments, or boredom. Finally, we match survey responses to the respondents' digital traces in Wikipedia's server logs, enabling the discovery of behavioral patterns associated with specific use cases. For instance, we observe long and fast-paced page sequences across topics for users who are bored or exploring randomly, whereas those using Wikipedia for work or school spend more time on individual articles focused on topics such as science. Our findings advance our understanding of reader motivations and behavior on Wikipedia and can have implications for developers aiming to improve Wikipedia's user experience, editors striving to cater to their readers' needs, third-party services (such as search engines) providing access to Wikipedia content, and researchers aiming to build tools such as recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table

    The DIGMAP geo-temporal web gazetteer service

    Get PDF
    This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval

    Search strategies of Wikipedia readers

    Get PDF
    The quest for information is one of the most common activity of human beings. Despite the the impressive progress of search engines, not to miss the needed piece of information could be still very tough, as well as to acquire specific competences and knowledge by shaping and following the proper learning paths. Indeed, the need to find sensible paths in information networks is one of the biggest challenges of our societies and, to effectively address it, it is important to investigate the strategies adopted by human users to cope with the cognitive bottleneck of finding their way in a growing sea of information. Here we focus on the case of Wikipedia and investigate a recently released dataset about users’ click on the English Wikipedia, namely the English Wikipedia Clickstream. We perform a semantically charged analysis to uncover the general patterns followed by information seekers in the multi-dimensional space of Wikipedia topics/categories. We discover the existence of well defined strategies in which users tend to start from very general, i.e., semantically broad, pages and progressively narrow down the scope of their navigation, while keeping a growing semantic coherence. This is unlike strategies associated to tasks with predefined search goals, namely the case of the Wikispeedia game. In this case users first move from the ‘particular’ to the ‘universal’ before focusing down again to the required target. The clear picture offered here represents a very important stepping stone towards a better design of information networks and recommendation strategies, as well as the construction of radically new learning paths

    Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

    Get PDF
    Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

    Characterization of the Wikipedia Traffic

    Get PDF
    Since its inception, Wikipedia has grown to a solid and stable project and turned into a mass collaboration tool that allows the sharing and distribution of knowledge. The wiki approach that basis this initiative promotes the participation and collaboration of users. In addition to visits for browsing its contents, Wikipedia also receives the contributions of users to improve them. In the past, researchers paid attention to different aspects concerning authoring and quality of contents. However, little effort has been made to study the nature of the visits that Wikipedia receives. We conduct such an study using a sample of users' requests provided by the Wikimedia Foundation in the form of Squid log lines. Our sample contains more that 14,000 million requests from users all around the world and directed to all the projects maintained by the Wikimedia Foundation, including different editions of Wikipedia. This papers describes the work made to characterize the traffic directed to Wikipedia and consisting of the requests sent by its users. Our main aim is to obtain a detailed description of its composition in terms of the percentages corresponding to the different types of requests making part of it. The benefits from our work may range from the prediction of traffic peaks to the determination of the kind of resources most often requested, which can be useful for scalability considerations
    • …
    corecore