19,189 research outputs found
Attention Is (not) All You Need for Commonsense Reasoning
The recently introduced BERT model exhibits strong performance on several
language understanding benchmarks. In this paper, we describe a simple
re-implementation of BERT for commonsense reasoning. We show that the
attentions produced by BERT can be directly utilized for tasks such as the
Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed
attention-guided commonsense reasoning method is conceptually simple yet
empirically powerful. Experimental analysis on multiple datasets demonstrates
that our proposed system performs remarkably well on all cases while
outperforming the previously reported state of the art by a margin. While
results suggest that BERT seems to implicitly learn to establish complex
relationships between entities, solving commonsense reasoning tasks might
require more than unsupervised models learned from huge text corpora.Comment: to appear at ACL 201
Contrastive Decoding Improves Reasoning in Large Language Models
We demonstrate that Contrastive Decoding -- a simple, computationally light,
and training-free text generation method proposed by Li et al 2022 -- achieves
large out-of-the-box improvements over greedy decoding on a variety of
reasoning tasks. Originally shown to improve the perceived quality of long-form
text generation, Contrastive Decoding searches for strings that maximize a
weighted difference in likelihood between strong and weak models. We show that
Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM
2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA
2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in
addition to improvements on a collection of other tasks. Analysis suggests that
Contrastive Decoding improves over existing methods by preventing some abstract
reasoning errors, as well as by avoiding simpler modes such as copying sections
of the input during chain-of-thought. Overall, Contrastive Decoding outperforms
nucleus sampling for long-form generation and greedy decoding for reasoning
tasks, making it a powerful general purpose method for generating text from
language models.Comment: 9 figures, 11 table
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting
Recently, Chain-of-Thought (CoT) prompting has delivered success on complex
reasoning tasks, which aims at designing a simple prompt like ``Let's think
step by step'' or multiple in-context exemplars with well-designed rationales
to elicit Large Language Models (LLMs) to generate intermediate reasoning
steps. However, the generated rationales often come with mistakes, making
unfactual and unfaithful reasoning chains. To mitigate this brittleness, we
propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting
LLMs to generate explicit pieces of knowledge evidence in the form of structure
triple. This is inspired by our human behaviors, i.e., we can draw a mind map
or knowledge map as the reasoning evidence in the brain before answering a
complex question. Benefiting from CoK, we additionally introduce a
F^2-Verification method to estimate the reliability of the reasoning chains in
terms of factuality and faithfulness. For the unreliable response, the wrong
evidence can be indicated to prompt the LLM to rethink. Extensive experiments
demonstrate that our method can further improve the performance of commonsense,
factual, symbolic, and arithmetic reasoning tasks.Comment: Work in progres
- …