Enhancing LLM robustness to perturbed instructions: an empirical study

Agrawal, Aryan; Alazraki, Lisa; Honarvar, Shahin; Rei, Marek

Search results>Research output from Spiral - Imperial College Digital Repository

conference paper

oai:spiral.imperial.ac.uk:10044/1/125232

Enhancing LLM robustness to perturbed instructions: an empirical study

Authors: Aryan Agrawal
Lisa Alazraki
Shahin Honarvar
Marek Rei
Publication date: 5 March 2025
Publisher

Abstract

Large Language Models (LLMs) are highly vulnerable to input perturbations, as even a small prompt change may result in a substantially different output. Existing methods to enhance LLM robustness are primarily focused on perturbed data samples, whereas improving resiliency to perturbations of task-level instructions has remained relatively underexplored. In this work, we focus on character- and word-level edits of task-specific instructions, which substantially degrade downstream performance. We experiment with a variety of techniques to enhance the robustness of LLMs, including self-denoising and representation alignment, testing different models (Llama 3 and Flan-T5), datasets (CoLa, QNLI, SST-2) and instructions (both task-oriented and role-oriented). We find that, on average, self-denoising—whether performed by a frozen LLM or a fine-tuned model—achieves substantially higher performance gains than alternative strategies, including more complex baselines such as ensembling and supervised methods

Similar works

Full text

Spiral - Imperial College Digital Repository

oai:spiral.imperial.ac.uk:1004...

Last time updated on 25/11/2025Provided by our Supporting member

This paper was published in Spiral - Imperial College Digital Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: https://creativecommons.org/licenses/by/4.0/