Recent advances in Large Language Models (LLMs) have led to an emergent
ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that
adds intermediate rationale steps between questions and answers to construct
prompts. Conditioned on these prompts, LLMs can effectively learn in context to
generate rationales that lead to more accurate answers than when answering the
same question directly. To design LLM prompts, one important setting, called
demonstration selection, considers selecting demonstrations from an example
bank. Existing methods use various heuristics for this selection, but for CoT
prompting, which involves unique rationales, it is essential to base the
selection upon the intrinsic skills that CoT rationales need, for instance, the
skills of addition or subtraction for math word problems.
To address this requirement, we introduce a novel approach named Reasoning
Skill Discovery (RSD) that use unsupervised learning to create a latent space
representation of rationales, called a reasoning skill. Simultaneously, RSD
learns a reasoning policy to determine the required reasoning skill for a given
question. This can then guide the selection of examples that demonstrate the
required reasoning skills. Our approach offers several desirable properties: it
is (1) theoretically grounded, (2) sample-efficient, requiring no LLM inference
or manual prompt design, and (3) LLM-agnostic. Empirically, RSD outperforms
existing methods by up to 6% in terms of the answer accuracy across multiple
reasoning tasks