29 research outputs found
Multilinguals and Wikipedia Editing
This article analyzes one month of edits to Wikipedia in order to examine the
role of users editing multiple language editions (referred to as multilingual
users). Such multilingual users may serve an important function in diffusing
information across different language editions of the encyclopedia, and prior
work has suggested this could reduce the level of self-focus bias in each
edition. This study finds multilingual users are much more active than their
single-edition (monolingual) counterparts. They are found in all language
editions, but smaller-sized editions with fewer users have a higher percentage
of multilingual users than larger-sized editions. About a quarter of
multilingual users always edit the same articles in multiple languages, while
just over 40% of multilingual users edit different articles in different
languages. When non-English users do edit a second language edition, that
edition is most frequently English. Nonetheless, several regional and
linguistic cross-editing patterns are also present
Mapping bilateral information interests using the activity of Wikipedia editors
We live in a global village where electronic communication has eliminated the
geographical barriers of information exchange. The road is now open to
worldwide convergence of information interests, shared values, and
understanding. Nevertheless, interests still vary between countries around the
world. This raises important questions about what today's world map of in-
formation interests actually looks like and what factors cause the barriers of
information exchange between countries. To quantitatively construct a world map
of information interests, we devise a scalable statistical model that
identifies countries with similar information interests and measures the
countries' bilateral similarities. From the similarities we connect countries
in a global network and find that countries can be mapped into 18 clusters with
similar information interests. Through regression we find that language and
religion best explain the strength of the bilateral ties and formation of
clusters. Our findings provide a quantitative basis for further studies to
better understand the complex interplay between shared interests and conflict
on a global scale. The methodology can also be extended to track changes over
time and capture important trends in global information exchange.Comment: 11 pages, 3 figures in Palgrave Communications 1 (2015
Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach
Portrayals of history are never complete, and each description inherently
exhibits a specific viewpoint and emphasis. In this paper, we aim to
automatically identify such differences by computing timelines and detecting
temporal focal points of written history across languages on Wikipedia. In
particular, we study articles related to the history of all UN member states
and compare them in 30 language editions. We develop a computational approach
that allows to identify focal points quantitatively, and find that Wikipedia
narratives about national histories (i) are skewed towards more recent events
(recency bias) and (ii) are distributed unevenly across the continents with
significant focus on the history of European countries (Eurocentric bias). We
also establish that national historical timelines vary across language
editions, although average interlingual consensus is rather high. We hope that
this paper provides a starting point for a broader computational analysis of
written history on Wikipedia and elsewhere
Cross-language Wikipedia Editing of Okinawa, Japan
This article analyzes users who edit Wikipedia articles about Okinawa, Japan,
in English and Japanese. It finds these users are among the most active and
dedicated users in their primary languages, where they make many large,
high-quality edits. However, when these users edit in their non-primary
languages, they tend to make edits of a different type that are overall smaller
in size and more often restricted to the narrow set of articles that exist in
both languages. Design changes to motivate wider contributions from users in
their non-primary languages and to encourage multilingual users to transfer
more information across language divides are presented.Comment: In Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI 2015. AC
Tracking Knowledge Propagation Across Wikipedia Languages
In this paper, we present a dataset of inter-language knowledge propagation in Wikipedia. Covering the entire 309 language editions and 33M articles, the dataset aims to track the full propagation history of Wikipedia concepts, and allow follow-up research on building predictive models of them. For this purpose, we align all the Wikipedia articles in a language-agnostic manner according to the concept they cover, which results in 13M propagation instances. To the best of our knowledge, this dataset is the first to explore the full inter-language propagation at a large scale. Together with the dataset, a holistic overview of the propagation and key insights about the underlying structural factors are provided to aid future research. For example, we find that although long cascades are unusual, the propagation tends to continue further once it reaches more than four language editions. We also find that the size of language editions is associated with the speed of propagation. We believe the dataset not only contributes to the prior literature on Wikipedia growth but also enables new use cases such as edit recommendation for addressing knowledge gaps, detection of disinformation, and cultural relationship analysis
Recommended from our members
When humans and machines collaborate: Cross-lingual Label Editing in Wikidata
The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data
Wikipedia and Westminster: Quality and Dynamics of Wikipedia Pages about UK Politicians
Wikipedia is a major source of information providing a large variety of
content online, trusted by readers from around the world. Readers go to
Wikipedia to get reliable information about different subjects, one of the most
popular being living people, and especially politicians. While a lot is known
about the general usage and information consumption on Wikipedia, less is known
about the life-cycle and quality of Wikipedia articles in the context of
politics. The aim of this study is to quantify and qualify content production
and consumption for articles about politicians, with a specific focus on UK
Members of Parliament (MPs). First, we analyze spatio-temporal patterns of
readers' and editors' engagement with MPs' Wikipedia pages, finding huge peaks
of attention during election times, related to signs of engagement on other
social media (e.g. Twitter). Second, we quantify editors' polarisation and find
that most editors specialize in a specific party and choose specific news
outlets as references. Finally we observe that the average citation quality is
pretty high, with statements on 'Early life and career' missing citations most
often (18%).Comment: A preprint of accepted publication at the 31ST ACM Conference on
Hypertext and Social Media (HT'20
Does Astronomy research become too dated for the public? Wikipedia citations to Astronomy and Astrophysics journal articles 1996-2014
Astronomy is a natural science attracting substantial public interest. On a human scale, most individual celestial objects are essentially unchanging but is the same true for interest in astronomy research? This article uses the popular online encyclopedia Wikipedia as a proxy for public interest in academic research and assesses the extent to which it cites astronomy and astrophysics articles published between 1996 and 2014. Automatic Bing searches in Webometric Analyst were used to count the number of citations to astronomy and astrophysics articles from Wikipedia. The results show that older papers from before 2008 are increasingly less likely to be cited. This is true overall and in most of the major language versions of Wikipedia, although it may reflect editors’ interests rather than the public’s interests. This is consistent with a moderate tendency towards obsolescence in public interest in research, although it is probably affected by the dates on which most Wikipedia content on the topic was created. Papers may become obsolete if they report evidence that are later superseded by improved data or if they propose a model that is later replaced
The Influence of Multilingualism and Mutual Intelligibility on Wikipedia Reading Behaviour: A Research Proposal
Given the important role of Wikipedia in our everyday lives, a better understanding of how language skills affect Wikipedia usage is needed. If content is not available in a reader’s native language or a language that she can readily understand, access barriers and knowledge gaps are created, threatening Wikimedia’s goal to create knowledge equity among all its projects and their consumers. This article argues for research on the effects of multilingualism and mutual intelligibility on Wikipedia reading behaviour, focusing on the Nordic countries, Denmark, Norway, and Sweden. Initial exploratory analysis shows that while residents of these countries use the native language editions quite frequently, they rely strongly on English Wikipedia, too. Research questions and methods for future work in this area are presented