26 research outputs found

    Why We Read Wikipedia

    Get PDF
    Wikipedia is one of the most popular sites on the Web, with millions of users relying on it to satisfy a broad range of information needs every day. Although it is crucial to understand what exactly these needs are in order to be able to meet them, little is currently known about why users visit Wikipedia. The goal of this paper is to fill this gap by combining a survey of Wikipedia readers with a log-based analysis of user activity. Based on an initial series of user surveys, we build a taxonomy of Wikipedia use cases along several dimensions, capturing users' motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia. Then, we quantify the prevalence of these use cases via a large-scale user survey conducted on live Wikipedia with almost 30,000 responses. Our analyses highlight the variety of factors driving users to Wikipedia, such as current events, media coverage of a topic, personal curiosity, work or school assignments, or boredom. Finally, we match survey responses to the respondents' digital traces in Wikipedia's server logs, enabling the discovery of behavioral patterns associated with specific use cases. For instance, we observe long and fast-paced page sequences across topics for users who are bored or exploring randomly, whereas those using Wikipedia for work or school spend more time on individual articles focused on topics such as science. Our findings advance our understanding of reader motivations and behavior on Wikipedia and can have implications for developers aiming to improve Wikipedia's user experience, editors striving to cater to their readers' needs, third-party services (such as search engines) providing access to Wikipedia content, and researchers aiming to build tools such as recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table

    Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages

    Full text link
    We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.Comment: 40 pages, 13 figures, 5 table

    Between news and history: Identifying networked topics of collective attention on Wikipedia

    Get PDF
    The digital information landscape has introduced a new dimension to understanding how we collectively react to new information and preserve it at the societal level. This, together with the emergence of platforms such as Wikipedia, has challenged traditional views on the relationship between current events and historical accounts of events, with an ever-shrinking divide between "news" and "history". Wikipedia's place as the Internet's primary reference work thus poses the question of how it represents both traditional encyclopaedic knowledge and evolving important news stories. In other words, how is information on and attention towards current events integrated into the existing topical structures of Wikipedia? To address this we develop a temporal community detection approach towards topic detection that takes into account both short term dynamics of attention as well as long term article network structures. We apply this method to a dataset of one year of current events on Wikipedia to identify clusters distinct from those that would be found solely from page view time series correlations or static network structure. We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge by the relative importance of collective attention dynamics vs link structures. We also offer important developments by identifying and describing the emergent topics on Wikipedia. This work provides a means of distinguishing how these information and attention clusters are related to Wikipedia's twin faces of encyclopaedic knowledge and current events -- crucial to understanding the production and consumption of knowledge in the digital age

    The experience as a document: designing for the future of collaborative remembering in digital archives

    Get PDF
    How does it feel when we remember together on-line? Who gets to say what it is worth to be remembered? To understand how the user experience of participation is affecting the formation of collective memories in the context of online environments, first it is important to take into consideration how the notion of memory has been transformed under the influence of the digital revolution. I aim to contribute to the field of User Experience (UX) research theorizing on the felt experience of users from a memory perspective, taking into consideration aspects linked to both personal and collective memories in the context of connected environments.Harassment and hate speech in connected conversational environments are specially targeted to women and underprivileged communities, which has become a problem for digital archives of vernacular creativity (Burgess, J. E. 2007) such as YouTube, Twitter, Reddit and Wikipedia. An evaluation of the user experience of underprivileged communities in creative archives such as Wikipedia indicates the urgency for building a feminist space where women and queer folks can focus on knowledge production and learning without being harassed. The theoretical models and designs that I propose are a result of a series of prototype testing and case studies focused on cognitive tools for a mediated human memory operating inside transactive memory systems. With them, aims to imagine the means by which feminist protocols for UX design and research can assist in the building and maintenance of the archive as a safe/brave space.Working with perspectives from media theory, memory theory and gender studies and centering the user experience of participation for women, queer folks, people of colour (POC) and other vulnerable and underrepresented communities as the main focus of inquiring, my research takes an interdisciplinary approach to interrogate how online misogyny and other forms of abuse are perceived by communities placed outside the center of the hegemonic normativity, and how the user experience of online abuse is affecting the formation of collective memories in the context of online environments

    Computational approaches to semantic change (Volume 6)

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    The Future of Information Sciences : INFuture2011 : Information Sciences and e-Society

    Get PDF

    A treatise on Web 2.0 with a case study from the financial markets

    Get PDF
    There has been much hype in vocational and academic circles surrounding the emergence of web 2.0 or social media; however, relatively little work was dedicated to substantiating the actual concept of web 2.0. Many have dismissed it as not deserving of this new title, since the term web 2.0 assumes a certain interpretation of web history, including enough progress in certain direction to trigger a succession [i.e. web 1.0 → web 2.0]. Others provided arguments in support of this development, and there has been a considerable amount of enthusiasm in the literature. Much research has been busy evaluating current use of web 2.0, and analysis of the user generated content, but an objective and thorough assessment of what web 2.0 really stands for has been to a large extent overlooked. More recently the idea of collective intelligence facilitated via web 2.0, and its potential applications have raised interest with researchers, yet a more unified approach and work in the area of collective intelligence is needed. This thesis identifies and critically evaluates a wider context for the web 2.0 environment, and what caused it to emerge; providing a rich literature review on the topic, a review of existing taxonomies, a quantitative and qualitative evaluation of the concept itself, an investigation of the collective intelligence potential that emerges from application usage. Finally, a framework for harnessing collective intelligence in a more systematic manner is proposed. In addition to the presented results, novel methodologies are also introduced throughout this work. In order to provide interesting insight but also to illustrate analysis, a case study of the recent financial crisis is considered. Some interesting results relating to the crisis are revealed within user generated content data, and relevant issues are discussed where appropriate
    corecore