44 research outputs found
Biased Embeddings from Wild Data: Measuring, Understanding and Removing
Many modern Artificial Intelligence (AI) systems make use of data embeddings,
particularly in the domain of Natural Language Processing (NLP). These
embeddings are learnt from data that has been gathered "from the wild" and have
been found to contain unwanted biases. In this paper we make three
contributions towards measuring, understanding and removing this problem. We
present a rigorous way to measure some of these biases, based on the use of
word lists created for social psychology applications; we observe how gender
bias in occupations reflects actual gender bias in the same occupations in the
real world; and finally we demonstrate how a simple projection can
significantly reduce the effects of embedding bias. All this is part of an
ongoing effort to understand how trust can be built into AI systems.Comment: Author's original versio
History Playground: A Tool for Discovering Temporal Trends in Massive Textual Corpora
Recent studies have shown that macroscopic patterns of continuity and change
over the course of centuries can be detected through the analysis of time
series extracted from massive textual corpora. Similar data-driven approaches
have already revolutionised the natural sciences, and are widely believed to
hold similar potential for the humanities and social sciences, driven by the
mass-digitisation projects that are currently under way, and coupled with the
ever-increasing number of documents which are "born digital". As such, new
interactive tools are required to discover and extract macroscopic patterns
from these vast quantities of textual data. Here we present History Playground,
an interactive web-based tool for discovering trends in massive textual
corpora. The tool makes use of scalable algorithms to first extract trends from
textual corpora, before making them available for real-time search and
discovery, presenting users with an interface to explore the data. Included in
the tool are algorithms for standardization, regression, change-point detection
in the relative frequencies of ngrams, multi-term indices and comparison of
trends across different corpora
Women are seen more than heard in online newspapers
Feminist news media researchers have long contended that masculine news values shape journalists’ quotidian decisions about what is newsworthy. As a result, it is argued, topics and issues traditionally regarded as primarily of interest and relevance to women are routinely marginalised in the news, while men’s views and voices are given privileged space. When women do show up in the news, it is often as “eye candy,” thus reinforcing women’s value as sources of visual pleasure rather than residing in the content of their views. To date, evidence to support such claims has tended to be based on small-scale, manual analyses of news content. In this article, we report on findings from our large-scale, data-driven study of gender representation in online English language news media. We analysed both words and images so as to give a broader picture of how gender is represented in online news. The corpus of news content examined consists of 2,353,652 articles collected over a period of six months from more than 950 different news outlets. From this initial dataset, we extracted 2,171,239 references to named persons and 1,376,824 images resolving the gender of names and faces using automated computational methods. We found that males were represented more often than females in both images and text, but in proportions that changed across topics, news outlets and mode. Moreover, the proportion of females was consistently higher in images than in text, for virtually all topics and news outlets; women were more likely to be represented visually than they were mentioned as a news actor or source. Our large-scale, data-driven analysis offers important empirical evidence of macroscopic patterns in news content concerning the way men and women are represented