258 research outputs found
Recommended from our members
Orbital Stability Analysis for Perturbed Nonlinear Systems and Natural Entrainment via Adaptive Andronov-Hopf Oscillator
FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models
Detecting stereotypes and biases in Large Language Models (LLMs) can enhance
fairness and reduce adverse impacts on individuals or groups when these LLMs
are applied. However, the majority of existing methods focus on measuring the
model's preference towards sentences containing biases and stereotypes within
datasets, which lacks interpretability and cannot detect implicit biases and
stereotypes in the real world. To address this gap, this paper introduces a
four-stage framework to directly evaluate stereotypes and biases in the
generated content of LLMs, including direct inquiry testing, serial or adapted
story testing, implicit association testing, and unknown situation testing.
Additionally, the paper proposes multi-dimensional evaluation metrics and
explainable zero-shot prompts for automated evaluation. Using the education
sector as a case study, we constructed the Edu-FairBench based on the
four-stage framework, which encompasses 12,632 open-ended questions covering
nine sensitive factors and 26 educational scenarios. Experimental results
reveal varying degrees of stereotypes and biases in five LLMs evaluated on
Edu-FairBench. Moreover, the results of our proposed automated evaluation
method have shown a high correlation with human annotations
- …