1 research outputs found
Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?
As the breadth and depth of language model applications continue to expand
rapidly, it is increasingly important to build efficient frameworks for
measuring and mitigating the learned or inherited social biases of these
models. In this paper, we present our work on evaluating instruction fine-tuned
language models' ability to identify bias through zero-shot prompting,
including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction
fine-tuned versions, Alpaca 7B performs best on the bias identification task
with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and
data diversity could lead to further performance gain. This is a
work-in-progress presenting the first component of our bias mitigation
framework. We will keep updating this work as we get more results