Evaluation of geographical distortions in language models

Decoupes, Rémy; Interdonato, Roberto; Roche, Mathieu; Teisseire, Maguelonne; Valentin, Sarah

research article journal article

oai:agritrop.cirad.fr:615454

Evaluation of geographical distortions in language models

Authors: Rémy Decoupes
Roberto Interdonato
Mathieu Roche
Maguelonne Teisseire
Sarah Valentin
Publication date: 1 January 2025
Publisher: Springer
Doi

Abstract

Geographic bias in language models (LMs) is an underexplored dimension of model fairness, despite growing attention being given to other social biases. We investigate whether LMs provide equally accurate representations across all global regions and propose a benchmark of four indicators to detect undertrained and underperforming areas: (i) indirect assessment of geographic training data coverage via tokenizer analysis, (ii) evaluation of basic geographic knowledge, (iii) detection of geographic distortions, and (iv) visualization of performance disparities through maps. Applying this framework to ten widely used encoder- and decoder-based models, we find systematic overrepresentation of Western countries and consistent underrepresentation of several African, Eastern European, and Middle Eastern regions, leading to measurable performance gaps. We further analyse the impact of these biases on downstream tasks, particularly in crisis response, and show that regions most vulnerable to natural disasters are often those with poorer LM coverage. Our findings underscore the need for geographically balanced LMs to ensure equitable and effective global applications

Similar works

Full text

Agritrop

oai:agritrop.cirad.fr:615454

Last time updated on 06/01/2026

This paper was published in Agritrop.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: https://creativecommons.org/licenses/by-nc-nd/4.0/