Search CORE

60 research outputs found

Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment

Author: Laban Philippe
Murakhovs'ka Lidiya
Wu Chien-Sheng
Xiong Caiming
Publication venue
Publication date: 14/11/2023
Field of study

The interactive nature of Large Language Models (LLMs) theoretically allows models to refine and improve their answers, yet systematic analysis of the multi-turn behavior of LLMs remains limited. In this paper, we propose the FlipFlop experiment: in the first round of the conversation, an LLM responds to a prompt containing a classification task. In a second round, the LLM is challenged with a follow-up phrase like "Are you sure?", offering an opportunity for the model to reflect on its initial answer, and decide whether to confirm or flip its answer. A systematic study of nine LLMs on seven classification tasks reveals that models flip their answers on average 46% of the time and that all models see a deterioration of accuracy between their first and final prediction, with an average drop of 17%. The FlipFlop experiment illustrates the universality of sycophantic behavior in LLMs and provides a robust framework to analyze model behavior and evaluate potential solutions

arXiv.org e-Print Archive