Search CORE

44 research outputs found

Biased Embeddings from Wild Data: Measuring, Understanding and Removing

Author: Cristianini Nello
Lansdall-Welfare Thomas
Sutton Adam
Publication venue
Publication date: 16/06/2018
Field of study

Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered "from the wild" and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe how gender bias in occupations reflects actual gender bias in the same occupations in the real world; and finally we demonstrate how a simple projection can significantly reduce the effects of embedding bias. All this is part of an ongoing effort to understand how trust can be built into AI systems.Comment: Author's original versio

arXiv.org e-Print Archive

Explore Bristol Research

History Playground: A Tool for Discovering Temporal Trends in Massive Textual Corpora

Author: Cristianini Nello
Lansdall-Welfare Thomas
Publication venue
Publication date: 04/06/2018
Field of study

Recent studies have shown that macroscopic patterns of continuity and change over the course of centuries can be detected through the analysis of time series extracted from massive textual corpora. Similar data-driven approaches have already revolutionised the natural sciences, and are widely believed to hold similar potential for the humanities and social sciences, driven by the mass-digitisation projects that are currently under way, and coupled with the ever-increasing number of documents which are "born digital". As such, new interactive tools are required to discover and extract macroscopic patterns from these vast quantities of textual data. Here we present History Playground, an interactive web-based tool for discovering trends in massive textual corpora. The tool makes use of scalable algorithms to first extract trends from textual corpora, before making them available for real-time search and discovery, presenting users with an interface to explore the data. Included in the tool are algorithms for standardization, regression, change-point detection in the relative frequencies of ngrams, multi-term indices and comparison of trends across different corpora

arXiv.org e-Print Archive

Explore Bristol Research

History Playground: A Tool for Discovering Temporal Trends in Massive Textual Corpora

Author: Cristianini Nello
Lansdall-Welfare Thomas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/06/2019
Field of study

Explore Bristol Research

Gender classification by deep learning on millions of weakly labelled images

Author: Cristianini Nello
Jia Sen
Lansdall-Welfare Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

Crossref

Explore Bristol Research

Large-scale content analysis of historical newspapers in the town of Gorizia 1873–1914

Author: Cristianini Nello
Dato Gaetano
Lansdall-Welfare Thomas
Publication venue: 'Informa UK Limited'
Publication date: 26/03/2018
Field of study

Crossref

Explore Bristol Research

Biased Embeddings from Wild Data:Measuring, Understanding and Removing

Author: Cristianini Nello
Lansdall-Welfare Thomas
Sutton Adam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2018
Field of study

Explore Bristol Research

Women are seen more than heard in online newspapers

Author: Carter Cynthia
Cristianini Nello
Jia Sen
Lansdall-Welfare Thomas
Sudhahar Saatviga
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/02/2016
Field of study

Feminist news media researchers have long contended that masculine news values shape journalists’ quotidian decisions about what is newsworthy. As a result, it is argued, topics and issues traditionally regarded as primarily of interest and relevance to women are routinely marginalised in the news, while men’s views and voices are given privileged space. When women do show up in the news, it is often as “eye candy,” thus reinforcing women’s value as sources of visual pleasure rather than residing in the content of their views. To date, evidence to support such claims has tended to be based on small-scale, manual analyses of news content. In this article, we report on findings from our large-scale, data-driven study of gender representation in online English language news media. We analysed both words and images so as to give a broader picture of how gender is represented in online news. The corpus of news content examined consists of 2,353,652 articles collected over a period of six months from more than 950 different news outlets. From this initial dataset, we extracted 2,171,239 references to named persons and 1,376,824 images resolving the gender of names and faces using automated computational methods. We found that males were represented more often than females in both images and text, but in proportions that changed across topics, news outlets and mode. Moreover, the proportion of females was consistently higher in images than in text, for virtually all topics and news outlets; women were more likely to be represented visually than they were mentioned as a news actor or source. Our large-scale, data-driven analysis offers important empirical evidence of macroscopic patterns in news content concerning the way men and women are represented

Crossref

Online Research @ Cardiff

Directory of Open Access Journals

PubMed Central

Explore Bristol Research

FigShare

Content analysis of 150 years of British periodicals

Author: Carpineto
Flaounas
Flaounas
Gibbs
James Thompson
Jia
Justin Lewis
Lehmann
Nello Cristianini
Nicholson
Saatviga Sudhahar
Suchanek
Thomas Lansdall-Welfare
Walker
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date
Field of study

Crossref