Self-Critique Prompting with Large Language Models for Inductive
  Instructions

Chen, Yi; Mi, Fei; Wang, Hongru; Wang, Rui; Wong, Kam-Fai; Xu, Ruifeng

Self-Critique Prompting with Large Language Models for Inductive Instructions

Authors: Yi Chen
Fei Mi
Hongru Wang
Rui Wang
Kam-Fai Wong
Ruifeng Xu
Publication date: 23 May 2023
Publisher

Abstract

Numerous works are proposed to improve or evaluate the capabilities of Large language models (LLMs) to fulfill user instructions. However, they neglect the possibility that user inputs may inherently contain incorrect information due to users' false beliefs or malicious intents. In this way, blindly adhering to users' false content will cause deception and harm. To address this problem, we propose a challenging benchmark consisting of Inductive Instructions (INDust) to evaluate whether LLMs could resist these instructions. The INDust includes 15K instructions across three categories: Fact-Checking Instructions, Questions based on False Premises, and Creative Instructions based on False Premises. Our experiments on several strong LLMs reveal that current LLMs can be easily deceived by INDust into generating misleading and malicious statements. Hence we employ Self-Critique prompting to encourage LLMs to not only critique themselves like in previous works but also the users, which show remarkable improvement in handling inductive instructions under both zero-shot and few-shot settings

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.13733

Last time updated on 25/05/2023