In Large Language Models (LLMs), there have been consistent advancements in
task-specific performance, largely influenced by effective prompt design. While
recent research on prompting has enhanced the reasoning capabilities of LLMs, a
gap remains in further improving their understanding abilities. In this study,
we introduce Metacognitive Prompting (MP), a strategy inspired by human
introspective reasoning processes. Using MP, LLMs undergo a systematic series
of structured, self-aware evaluations, drawing on both their vast inherent
knowledge and new insights. Our experiments involve five prevalent LLMs:
Llama2, Vicuna, PaLM, GPT-3.5, and GPT-4, all of which span various general
natural language understanding (NLU) tasks from the GLUE and SuperGLUE
benchmarks. Results indicate that, although GPT-4 consistently excels in most
tasks, PaLM, when equipped with MP, approaches its performance level.
Furthermore, across models and datasets, MP consistently outperforms existing
prompting methods, including standard and chain-of-thought prompting. This
study underscores the potential to amplify the understanding abilities of LLMs
and highlights the benefits of mirroring human introspective reasoning in NLU
tasks.Comment: 9 pages, in submissio