research article journal article
Evaluation of geographical distortions in language models
Abstract
Geographic bias in language models (LMs) is an underexplored dimension of model fairness, despite growing attention being given to other social biases. We investigate whether LMs provide equally accurate representations across all global regions and propose a benchmark of four indicators to detect undertrained and underperforming areas: (i) indirect assessment of geographic training data coverage via tokenizer analysis, (ii) evaluation of basic geographic knowledge, (iii) detection of geographic distortions, and (iv) visualization of performance disparities through maps. Applying this framework to ten widely used encoder- and decoder-based models, we find systematic overrepresentation of Western countries and consistent underrepresentation of several African, Eastern European, and Middle Eastern regions, leading to measurable performance gaps. We further analyse the impact of these biases on downstream tasks, particularly in crisis response, and show that regions most vulnerable to natural disasters are often those with poorer LM coverage. Our findings underscore the need for geographically balanced LMs to ensure equitable and effective global applications- article
- info:eu-repo/semantics/article
- Journal Article
- info:eu-repo/semantics/publishedVersion
- U10 - Informatique, mathématiques et statistiques
- distribution géographique
- analyse de données
- intelligence artificielle
- modélisation
- données spatiales
- http://aims.fao.org/aos/agrovoc/c_5083
- http://aims.fao.org/aos/agrovoc/c_15962
- http://aims.fao.org/aos/agrovoc/c_27064
- http://aims.fao.org/aos/agrovoc/c_230ab86c
- http://aims.fao.org/aos/agrovoc/c_379bbe9f