55,263 research outputs found
First Women, Second Sex: Gender Bias in Wikipedia
Contributing to history has never been as easy as it is today. Anyone with
access to the Web is able to play a part on Wikipedia, an open and free
encyclopedia. Wikipedia, available in many languages, is one of the most
visited websites in the world and arguably one of the primary sources of
knowledge on the Web. However, not everyone is contributing to Wikipedia from a
diversity point of view; several groups are severely underrepresented. One of
those groups is women, who make up approximately 16% of the current contributor
community, meaning that most of the content is written by men. In addition,
although there are specific guidelines of verifiability, notability, and
neutral point of view that must be adhered by Wikipedia content, these
guidelines are supervised and enforced by men.
In this paper, we propose that gender bias is not about participation and
representation only, but also about characterization of women. We approach the
analysis of gender bias by defining a methodology for comparing the
characterizations of men and women in biographies in three aspects: meta-data,
language, and network structure. Our results show that, indeed, there are
differences in characterization and structure. Some of these differences are
reflected from the off-line world documented by Wikipedia, but other
differences can be attributed to gender bias in Wikipedia content. We
contextualize these differences in feminist theory and discuss their
implications for Wikipedia policy.Comment: 10 pages, ACM style. Author's version of a paper to be presented at
ACM Hypertext 201
Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach
Portrayals of history are never complete, and each description inherently
exhibits a specific viewpoint and emphasis. In this paper, we aim to
automatically identify such differences by computing timelines and detecting
temporal focal points of written history across languages on Wikipedia. In
particular, we study articles related to the history of all UN member states
and compare them in 30 language editions. We develop a computational approach
that allows to identify focal points quantitatively, and find that Wikipedia
narratives about national histories (i) are skewed towards more recent events
(recency bias) and (ii) are distributed unevenly across the continents with
significant focus on the history of European countries (Eurocentric bias). We
also establish that national historical timelines vary across language
editions, although average interlingual consensus is rather high. We hope that
this paper provides a starting point for a broader computational analysis of
written history on Wikipedia and elsewhere
It’s Not What You Think: Gender Bias in Information about Fortune 1000 CEOs on Wikipedia
Increasingly, information generated by open collaboration communities is being trusted and used by individuals to make decisions and carry out work tasks. Little is known about the quality of this information or the bias it may contain. In this study we address the question: How is gender bias embedded in information about organizational leaders in an open collaboration community? To answer this question, we use the bias framework developed by Miranda and colleagues (2016) to study bias stemming from structural constraints and content restrictions in the open collaboration community Wikipedia. Comparison of Wikipedia profiles of Fortune 1000 CEOs reveals that selection, source, and influence bias stemming from structural constraints on Wikipedia advantage women and disadvantage men. This finding suggests that information developed by open collaboration communities may contain unexpected forms of bias
Why Wikipedia Often Overlooks Stories of Women in History
Wikipedia\u27s reliance on a volunteer editing base has resulted in a gender bias both in the quantity and quality of content around women. With less than 20% of Wikipedia\u27s editors identifying as women, only 30% of biographical entries have been written about women and entries on women tend to be shorter and more focused on relationships and family roles than entries on men. This article explores the causes of Wikipedia\u27s gender bias and offers ways that both individuals and institutions can help improve Wikipedia\u27s content around women
Wikipedia's Network Bias on Controversial Topics
The most important feature of Wikipedia is the presence of hyperlinks in
pages. Link placement is the product of people's collaboration, consequently
Wikipedia naturally inherits human bias. Due to the high influence that links'
disposition has on users' navigation sessions, one needs to verify that, given
a controversial topic, the hyperlinks' network does not expose users to only
one side of the subject. A Wikipedia's topic-induced network that prevents
users the discovery of different facets of an issue, suffers from structural
bias. In this work, we define the static structural bias, which indicates if
the strength of connections between pages of contrasting inclinations is the
same, and the dynamic structural bias, which quantifies the network's level
bias that users face over the course of their navigation sessions. Our
measurements of structural bias on several controversial topics demonstrate its
existence, revealing that users have low likelihood of reaching pages of
opposing inclination from where they start, and that they navigate Wikipedia
showing a behaviour much more biased than the expected from the baselines. Our
findings advance the relevance of the problem and pave the way for developing
systems that automatically measure and propose hyperlink locations that
minimize the presence and effects of structural bias
Methods for detecting and mitigating linguistic bias in text corpora
Im Zuge der fortschreitenden Ausbreitung des Webs in alle Aspekte des täglichen
Lebens wird Bias in Form von Voreingenommenheit und versteckten Meinungen zu einem
zunehmend herausfordernden Problem. Eine weitverbreitete Erscheinungsform ist Bias in
Textdaten. Um dem entgegenzuwirken hat die Online-Enzyklopädie Wikipedia das Prinzip
des neutralen Standpunkts (Englisch: Neutral Point of View, kurz: NPOV) eingefĂĽhrt,
welcher die Verwendung neutraler Sprache und die Vermeidung von einseitigen oder subjektiven
Formulierungen vorschreibt. Während Studien gezeigt haben, dass die Qualität von
Wikipedia-Artikel mit der Qualität von Artikeln in klassischen Enzyklopädien vergleichbar
ist, zeigt die Forschung gleichzeitig auch, dass Wikipedia anfällig für verschiedene Typen
von NPOV-Verletzungen ist. Bias zu identifizieren, kann eine herausfordernde Aufgabe sein,
sogar fĂĽr Menschen, und mit Millionen von Artikeln und einer zurĂĽckgehenden Anzahl von
Mitwirkenden wird diese Aufgabe zunehmend schwieriger. Wenn Bias nicht eingedämmt
wird, kann dies nicht nur zu Polarisierungen und Konflikten zwischen Meinungsgruppen
fĂĽhren, sondern Nutzer auch negativ in ihrer freien Meinungsbildung beeinflussen. Hinzu
kommt, dass sich Bias in Texten und in Ground-Truth-Daten negativ auf Machine Learning
Modelle, die auf diesen Daten trainiert werden, auswirken kann, was zu diskriminierendem
Verhalten von Modellen fĂĽhren kann.
In dieser Arbeit beschäftigen wir uns mit Bias, indem wir uns auf drei zentrale Aspekte
konzentrieren: Bias-Inhalte in Form von geschriebenen Aussagen, Bias von Crowdworkern
während des Annotierens von Daten und Bias in Word Embeddings Repräsentationen.
Wir stellen zwei Ansätze für die Identifizierung von Aussagen mit Bias in Textsammlungen
wie Wikipedia vor. Unser auf Features basierender Ansatz verwendet Bag-of-Word
Features inklusive einer Liste von Bias-Wörtern, die wir durch das Identifizieren von Clustern
von Bias-Wörtern im Vektorraum von Word Embeddings zusammengestellt haben.
Unser verbesserter, neuronaler Ansatz verwendet Gated Recurrent Neural Networks, um
Kontext-Abhängigkeiten zu erfassen und die Performance des Modells weiter zu verbessern.
Unsere Studie zum Thema Crowd Worker Bias deckt Bias-Verhalten von Crowdworkern
mit extremen Meinungen zu einem bestimmten Thema auf und zeigt, dass dieses Verhalten
die entstehenden Ground-Truth-Label beeinflusst, was wiederum Einfluss auf die Erstellung
von Datensätzen für Aufgaben wie Bias Identifizierung oder Sentiment Analysis hat. Wir
stellen Ansätze für die Abschwächung von Worker Bias vor, die Bewusstsein unter den
Workern erzeugen und das Konzept der sozialen Projektion verwenden.
Schließlich beschäftigen wir uns mit dem Problem von Bias in Word Embeddings,
indem wir uns auf das Beispiel von variierenden Sentiment-Scores fĂĽr Namen konzentrieren.
Wir zeigen, dass Bias in den Trainingsdaten von den Embeddings erfasst und an
nachgelagerte Modelle weitergegeben wird. In diesem Zusammenhang stellen wir einen
Debiasing-Ansatz vor, der den Bias-Effekt reduziert und sich positiv auf die produzierten
Label eines nachgeschalteten Sentiment Classifiers auswirkt
Overrepresentation of the Underrepresented: Gender Bias in Wikipedia
The goal of our research is to determine if gender bias exists in Wikipedia. Wikipedia is a very large dataset that has been used to train artificial intelligence models. If a dataset that is being used for this purpose is biased, then the artificial intelligence model that was trained with it will be biased as well, therefore making biased decisions. For this reason, it is important to explore large datasets for any potential biases before they are used in machine learning. Since Wikipedia is ontologically structured, we used graph theory to create a network of all of the website’s categories in order to look at the relationships between men-related categories and women-related categories with measures of shortest paths, successor intersections, and average betweenness centrality. We found there is an overexposure of categories that relate to men as they are far more central in Wikipedia and easier to get to than categories that relate to women. However, although women-related categories are not as central, there are about six times more categories that mention women in the title than men, which we consider to be overrepresentation. This is most likely due to women being considered an exception in many fields while men are considered the norm. Our methods can be used to either periodically study gender bias in Wikipedia as its data changes relatively frequently or our methods can be used to study other biases in either Wikipedia or other network-like datasets
Wikipedia is pushing the boundaries of scholarly practice but the gender gap must be addressed
Wikipedia is virtually uncontested as an instrumental conduit for global knowledge exchange. But who is creating and maintaining this knowledge and does it adequately reflect the diversity of expertise and discourse? Adrianne Wadewitz explores the gender gap in Wikipedia editors and argues that since approximately 90% of editors are male, every edit is inherently political and subject to problems of bias. If feminists and academics want the rules of Wikipedia to evolve in ways that reflect their expertise, they must participate in the conversation
- …