55,263 research outputs found

    First Women, Second Sex: Gender Bias in Wikipedia

    Full text link
    Contributing to history has never been as easy as it is today. Anyone with access to the Web is able to play a part on Wikipedia, an open and free encyclopedia. Wikipedia, available in many languages, is one of the most visited websites in the world and arguably one of the primary sources of knowledge on the Web. However, not everyone is contributing to Wikipedia from a diversity point of view; several groups are severely underrepresented. One of those groups is women, who make up approximately 16% of the current contributor community, meaning that most of the content is written by men. In addition, although there are specific guidelines of verifiability, notability, and neutral point of view that must be adhered by Wikipedia content, these guidelines are supervised and enforced by men. In this paper, we propose that gender bias is not about participation and representation only, but also about characterization of women. We approach the analysis of gender bias by defining a methodology for comparing the characterizations of men and women in biographies in three aspects: meta-data, language, and network structure. Our results show that, indeed, there are differences in characterization and structure. Some of these differences are reflected from the off-line world documented by Wikipedia, but other differences can be attributed to gender bias in Wikipedia content. We contextualize these differences in feminist theory and discuss their implications for Wikipedia policy.Comment: 10 pages, ACM style. Author's version of a paper to be presented at ACM Hypertext 201

    Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach

    Full text link
    Portrayals of history are never complete, and each description inherently exhibits a specific viewpoint and emphasis. In this paper, we aim to automatically identify such differences by computing timelines and detecting temporal focal points of written history across languages on Wikipedia. In particular, we study articles related to the history of all UN member states and compare them in 30 language editions. We develop a computational approach that allows to identify focal points quantitatively, and find that Wikipedia narratives about national histories (i) are skewed towards more recent events (recency bias) and (ii) are distributed unevenly across the continents with significant focus on the history of European countries (Eurocentric bias). We also establish that national historical timelines vary across language editions, although average interlingual consensus is rather high. We hope that this paper provides a starting point for a broader computational analysis of written history on Wikipedia and elsewhere

    It’s Not What You Think: Gender Bias in Information about Fortune 1000 CEOs on Wikipedia

    Get PDF
    Increasingly, information generated by open collaboration communities is being trusted and used by individuals to make decisions and carry out work tasks. Little is known about the quality of this information or the bias it may contain. In this study we address the question: How is gender bias embedded in information about organizational leaders in an open collaboration community? To answer this question, we use the bias framework developed by Miranda and colleagues (2016) to study bias stemming from structural constraints and content restrictions in the open collaboration community Wikipedia. Comparison of Wikipedia profiles of Fortune 1000 CEOs reveals that selection, source, and influence bias stemming from structural constraints on Wikipedia advantage women and disadvantage men. This finding suggests that information developed by open collaboration communities may contain unexpected forms of bias

    Why Wikipedia Often Overlooks Stories of Women in History

    Get PDF
    Wikipedia\u27s reliance on a volunteer editing base has resulted in a gender bias both in the quantity and quality of content around women. With less than 20% of Wikipedia\u27s editors identifying as women, only 30% of biographical entries have been written about women and entries on women tend to be shorter and more focused on relationships and family roles than entries on men. This article explores the causes of Wikipedia\u27s gender bias and offers ways that both individuals and institutions can help improve Wikipedia\u27s content around women

    Wikipedia's Network Bias on Controversial Topics

    Full text link
    The most important feature of Wikipedia is the presence of hyperlinks in pages. Link placement is the product of people's collaboration, consequently Wikipedia naturally inherits human bias. Due to the high influence that links' disposition has on users' navigation sessions, one needs to verify that, given a controversial topic, the hyperlinks' network does not expose users to only one side of the subject. A Wikipedia's topic-induced network that prevents users the discovery of different facets of an issue, suffers from structural bias. In this work, we define the static structural bias, which indicates if the strength of connections between pages of contrasting inclinations is the same, and the dynamic structural bias, which quantifies the network's level bias that users face over the course of their navigation sessions. Our measurements of structural bias on several controversial topics demonstrate its existence, revealing that users have low likelihood of reaching pages of opposing inclination from where they start, and that they navigate Wikipedia showing a behaviour much more biased than the expected from the baselines. Our findings advance the relevance of the problem and pave the way for developing systems that automatically measure and propose hyperlink locations that minimize the presence and effects of structural bias

    Methods for detecting and mitigating linguistic bias in text corpora

    Get PDF
    Im Zuge der fortschreitenden Ausbreitung des Webs in alle Aspekte des täglichen Lebens wird Bias in Form von Voreingenommenheit und versteckten Meinungen zu einem zunehmend herausfordernden Problem. Eine weitverbreitete Erscheinungsform ist Bias in Textdaten. Um dem entgegenzuwirken hat die Online-Enzyklopädie Wikipedia das Prinzip des neutralen Standpunkts (Englisch: Neutral Point of View, kurz: NPOV) eingeführt, welcher die Verwendung neutraler Sprache und die Vermeidung von einseitigen oder subjektiven Formulierungen vorschreibt. Während Studien gezeigt haben, dass die Qualität von Wikipedia-Artikel mit der Qualität von Artikeln in klassischen Enzyklopädien vergleichbar ist, zeigt die Forschung gleichzeitig auch, dass Wikipedia anfällig für verschiedene Typen von NPOV-Verletzungen ist. Bias zu identifizieren, kann eine herausfordernde Aufgabe sein, sogar für Menschen, und mit Millionen von Artikeln und einer zurückgehenden Anzahl von Mitwirkenden wird diese Aufgabe zunehmend schwieriger. Wenn Bias nicht eingedämmt wird, kann dies nicht nur zu Polarisierungen und Konflikten zwischen Meinungsgruppen führen, sondern Nutzer auch negativ in ihrer freien Meinungsbildung beeinflussen. Hinzu kommt, dass sich Bias in Texten und in Ground-Truth-Daten negativ auf Machine Learning Modelle, die auf diesen Daten trainiert werden, auswirken kann, was zu diskriminierendem Verhalten von Modellen führen kann. In dieser Arbeit beschäftigen wir uns mit Bias, indem wir uns auf drei zentrale Aspekte konzentrieren: Bias-Inhalte in Form von geschriebenen Aussagen, Bias von Crowdworkern während des Annotierens von Daten und Bias in Word Embeddings Repräsentationen. Wir stellen zwei Ansätze für die Identifizierung von Aussagen mit Bias in Textsammlungen wie Wikipedia vor. Unser auf Features basierender Ansatz verwendet Bag-of-Word Features inklusive einer Liste von Bias-Wörtern, die wir durch das Identifizieren von Clustern von Bias-Wörtern im Vektorraum von Word Embeddings zusammengestellt haben. Unser verbesserter, neuronaler Ansatz verwendet Gated Recurrent Neural Networks, um Kontext-Abhängigkeiten zu erfassen und die Performance des Modells weiter zu verbessern. Unsere Studie zum Thema Crowd Worker Bias deckt Bias-Verhalten von Crowdworkern mit extremen Meinungen zu einem bestimmten Thema auf und zeigt, dass dieses Verhalten die entstehenden Ground-Truth-Label beeinflusst, was wiederum Einfluss auf die Erstellung von Datensätzen für Aufgaben wie Bias Identifizierung oder Sentiment Analysis hat. Wir stellen Ansätze für die Abschwächung von Worker Bias vor, die Bewusstsein unter den Workern erzeugen und das Konzept der sozialen Projektion verwenden. Schließlich beschäftigen wir uns mit dem Problem von Bias in Word Embeddings, indem wir uns auf das Beispiel von variierenden Sentiment-Scores für Namen konzentrieren. Wir zeigen, dass Bias in den Trainingsdaten von den Embeddings erfasst und an nachgelagerte Modelle weitergegeben wird. In diesem Zusammenhang stellen wir einen Debiasing-Ansatz vor, der den Bias-Effekt reduziert und sich positiv auf die produzierten Label eines nachgeschalteten Sentiment Classifiers auswirkt

    Overrepresentation of the Underrepresented: Gender Bias in Wikipedia

    Get PDF
    The goal of our research is to determine if gender bias exists in Wikipedia. Wikipedia is a very large dataset that has been used to train artificial intelligence models. If a dataset that is being used for this purpose is biased, then the artificial intelligence model that was trained with it will be biased as well, therefore making biased decisions. For this reason, it is important to explore large datasets for any potential biases before they are used in machine learning. Since Wikipedia is ontologically structured, we used graph theory to create a network of all of the website’s categories in order to look at the relationships between men-related categories and women-related categories with measures of shortest paths, successor intersections, and average betweenness centrality. We found there is an overexposure of categories that relate to men as they are far more central in Wikipedia and easier to get to than categories that relate to women. However, although women-related categories are not as central, there are about six times more categories that mention women in the title than men, which we consider to be overrepresentation. This is most likely due to women being considered an exception in many fields while men are considered the norm. Our methods can be used to either periodically study gender bias in Wikipedia as its data changes relatively frequently or our methods can be used to study other biases in either Wikipedia or other network-like datasets

    Wikipedia is pushing the boundaries of scholarly practice but the gender gap must be addressed

    Get PDF
    Wikipedia is virtually uncontested as an instrumental conduit for global knowledge exchange. But who is creating and maintaining this knowledge and does it adequately reflect the diversity of expertise and discourse? Adrianne Wadewitz explores the gender gap in Wikipedia editors and argues that since approximately 90% of editors are male, every edit is inherently political and subject to problems of bias. If feminists and academics want the rules of Wikipedia to evolve in ways that reflect their expertise, they must participate in the conversation
    • …
    corecore