4,420 research outputs found

    Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach

    Full text link
    Portrayals of history are never complete, and each description inherently exhibits a specific viewpoint and emphasis. In this paper, we aim to automatically identify such differences by computing timelines and detecting temporal focal points of written history across languages on Wikipedia. In particular, we study articles related to the history of all UN member states and compare them in 30 language editions. We develop a computational approach that allows to identify focal points quantitatively, and find that Wikipedia narratives about national histories (i) are skewed towards more recent events (recency bias) and (ii) are distributed unevenly across the continents with significant focus on the history of European countries (Eurocentric bias). We also establish that national historical timelines vary across language editions, although average interlingual consensus is rather high. We hope that this paper provides a starting point for a broader computational analysis of written history on Wikipedia and elsewhere

    Towards Better Understanding Researcher Strategies in Cross-Lingual Event Analytics

    Full text link
    With an increasing amount of information on globally important events, there is a growing demand for efficient analytics of multilingual event-centric information. Such analytics is particularly challenging due to the large amount of content, the event dynamics and the language barrier. Although memory institutions increasingly collect event-centric Web content in different languages, very little is known about the strategies of researchers who conduct analytics of such content. In this paper we present researchers' strategies for the content, method and feature selection in the context of cross-lingual event-centric analytics observed in two case studies on multilingual Wikipedia. We discuss the influence factors for these strategies, the findings enabled by the adopted methods along with the current limitations and provide recommendations for services supporting researchers in cross-lingual event-centric analytics.Comment: In Proceedings of the International Conference on Theory and Practice of Digital Libraries 201

    DARIAH and the Benelux

    Get PDF

    Mapping Articles on China in Wikipedia: An Inter-Language Semantic Network Analysis

    Get PDF
    This article describes an inter-language semantic network analysis examining the differences between articles about China in the Chinese and English versions of Wikipedia. It explores the differences in the content of Wikipedia through (a) correlation analysis of semantic networks and (b) the salience of semantic concepts through their network centralities. The results suggest there is high dissimilarity between the semantic content of the English and Chinese versions of articles on China. While both pages focused on government, population, language, character, diplomatic relations, development of the economy, and science and technology, the Chinese-speaking and English-speaking contributors framed the article on China differently—according to dissimilarities in cultures, values, interests, situations, and emotions of different language groups. This research contributes to the literature and understanding of how culture of different language groups influences the process of crowdsourcing knowledge on online collaboration platforms

    Tracking Knowledge Propagation Across Wikipedia Languages

    Get PDF
    In this paper, we present a dataset of inter-language knowledge propagation in Wikipedia. Covering the entire 309 language editions and 33M articles, the dataset aims to track the full propagation history of Wikipedia concepts, and allow follow-up research on building predictive models of them. For this purpose, we align all the Wikipedia articles in a language-agnostic manner according to the concept they cover, which results in 13M propagation instances. To the best of our knowledge, this dataset is the first to explore the full inter-language propagation at a large scale. Together with the dataset, a holistic overview of the propagation and key insights about the underlying structural factors are provided to aid future research. For example, we find that although long cascades are unusual, the propagation tends to continue further once it reaches more than four language editions. We also find that the size of language editions is associated with the speed of propagation. We believe the dataset not only contributes to the prior literature on Wikipedia growth but also enables new use cases such as edit recommendation for addressing knowledge gaps, detection of disinformation, and cultural relationship analysis

    From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?

    Get PDF
    Wenceslao Arroyo-Machado is supported by a FPU Grant (FPU18/05835) from the Spanish Ministry of Universities and acknowledges funding from a project by MCIN (PID2019-109127RB-I00/SRA/10.13039/501100011033). Adrián A. Díaz-Faes acknowledges research project PID2020-112837RJ-I00 funded by MCIN/AEI/10.13039/501100011033. Rodrigo Costas is partially funded by the South African DSI-NRF Centre of Excellence in Scientometrics and Science, Technology and Innovation Policy (SciSTIP). A draft version of this paper was presented at the 26th STI Conference (Granada,2022). We acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).Universities face increasing demands to improve their visibility, public outreach, and online presence. There is a broad consensus that scientific reputation significantly increases the attention universities receive. However, in most cases estimates of scientific reputation are based on composite or weighted indicators and absolute positions in university rankings. In this study, we adopt a more granular approach to assessment of universities' scientific performance using a multidimensional set of indicators from the Leiden Ranking and testing their individual effects on university Wikipedia page views. We distinguish between international and local attention and find a positive association between research performance and Wikipedia attention which holds for regions and linguistic areas. Additional analysis shows that productivity, scientific impact, and international collaboration have a curvilinear effect on universities' Wikipedia attention. This finding suggests that there may be other factors than scientific reputation driving the general public's interest in universities. Our study adds to a growing stream of work which views altmetrics as tools to deepen science–society interactions rather than direct measures of impact and recognition of scientific outputs.Spanish Ministry of Universities FPU18/05835MCIN: PID2019-109127RB-I00/SRA/10.13039/501100011033MCIN/AEI/10.13039/501100011033 PID2020-112837RJ-I00South African DSI-NRF Centre of Excellence in Scientometrics and Science, Technology and Innovation Policy (SciSTIP)CSIC Open Access Publication Support Initiativ

    Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages

    Full text link
    We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.Comment: 40 pages, 13 figures, 5 table

    Structuring Wikipedia Articles with Section Recommendations

    Full text link
    Sections are the building blocks of Wikipedia articles. They enhance readability and can be used as a structured entry point for creating and expanding articles. Structuring a new or already existing Wikipedia article with sections is a hard task for humans, especially for newcomers or less experienced editors, as it requires significant knowledge about how a well-written article looks for each possible topic. Inspired by this need, the present paper defines the problem of section recommendation for Wikipedia articles and proposes several approaches for tackling it. Our systems can help editors by recommending what sections to add to already existing or newly created Wikipedia articles. Our basic paradigm is to generate recommendations by sourcing sections from articles that are similar to the input article. We explore several ways of defining similarity for this purpose (based on topic modeling, collaborative filtering, and Wikipedia's category system). We use both automatic and human evaluation approaches for assessing the performance of our recommendation system, concluding that the category-based approach works best, achieving precision@10 of about 80% in the human evaluation.Comment: SIGIR '18 camera-read

    Publication practices in motion: The benefits of open access publishing for the humanities

    Get PDF
    The changes we have seen in recent years in the scholarly publishing world - including the growth of digital publishing and changes to the role and strategies of publishers and libraries alike - represent the most dramatic paradigm shift in scholarly communications in centuries. This volume brings together leading scholars from across the humanities to explore that transformation and consider the challenges and opportunities it brings

    Structure and implications of the GLN

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 69-73).Languages vary enormously in global importance because of historical, demographic, political, and technological forces, and there has been much speculation about the current and future status of English as a global language. Yet there has been no rigorous way to define or quantify the relative global influence of languages. I propose that the structure of the network connecting multilingual speakers or translated texts, which I call the Global Language Network, provides a concept of language importance that is superior to simple economic or demographic measures. I map three independent global language networks (GLN) from millions of records of online and printed linguistic expressions taken from Wikipedia, Twitter, and UNESCO's database of book translations. I find that the structure of the three GLNs is hierarchically organized around English and a handful of hub languages, which include Spanish, German, French, Russian, Malay, and Portuguese, but not Chinese, Hindi or Arabic. Finally, I validate the measure of a language's centrality in the GLNs by showing that it correlates with measures of the number of illustrious people born in the countries associated with that language. I suggest that other phenomena of a language's present and future influence are systematically related to the structure of the global language networks.by Shahar Ronen.S.M
    corecore