31 research outputs found
Concept-aware Training Improves In-context Learning Ability of Language Models
Many recent language models (LMs) of Transformers family exhibit so-called
in-context learning (ICL) ability, manifested in the LMs' ability to modulate
their function by a task described in a natural language input. Previous work
curating these models assumes that ICL emerges from vast over-parametrization
or the scale of multi-task training. However, a complementary branch of recent
theoretical work attributes ICL emergence to specific properties of training
data and creates functional in-context learners in small-scale, synthetic
settings.
Inspired by recent findings on data properties driving the emergence of ICL,
we propose a method to create LMs able to better utilize the in-context
information, by constructing training scenarios where it is beneficial for the
LM to capture the analogical reasoning concepts. We measure that data sampling
of Concept-aware Training (CoAT) consistently improves models' reasoning
ability. As a result, the in-context learners trained with CoAT on only two
datasets of a single (QA) task perform comparably to larger models trained on
1600+ tasks.Comment: Work in progres
Publication/Citation: A Proof-Theoretic Approach to Mathematical Knowledge Management
There are many real-life examples of formal systems that support
constructions or proofs, but that do not provide direct support for remembering them so that they can be recalled and reused in the future. In this paper we examine the operations of publication (remembering a proof) and citation (recalling a proof for reuse), regarding them as forms of common subexpression elimination on proof terms. We then develop this idea from a proof theoretic perspective, describing a simple complete proof system for universal Horn equational logic using three new proof rules, publish, cite, and forget. These rules can provide a proof-theoretic infrastructure for proof reuse in any system
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering
Large Language Models (LLMs) have recently demonstrated exceptional
performance in various Natural Language Processing (NLP) tasks. They have also
shown the ability to perform chain-of-thought (CoT) reasoning to solve complex
problems. Recent studies have explored CoT reasoning in complex multimodal
scenarios, such as the science question answering task, by fine-tuning
multimodal models with high-quality human-annotated CoT rationales. However,
collecting high-quality COT rationales is usually time-consuming and costly.
Besides, the annotated rationales are hardly accurate due to the external
essential information missed. To address these issues, we propose a novel
method termed \emph{T-SciQ} that aims at teaching science question answering
with LLM signals. The T-SciQ approach generates high-quality CoT rationales as
teaching signals and is advanced to train much smaller models to perform CoT
reasoning in complex modalities. Additionally, we introduce a novel data mixing
strategy to produce more effective teaching data samples by policy for simple
and complex science question answer problems. Extensive experimental results
show that our T-SciQ method achieves a new state-of-the-art performance on the
ScienceQA benchmark, with an accuracy of 96.18\%. Moreover, our approach
outperforms the most powerful fine-tuned baseline by 4.5\%
Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models
Task-oriented dialogue (TOD) systems facilitate users in executing various
activities via multi-turn dialogues, but Large Language Models (LLMs) often
struggle to comprehend these intricate contexts. In this study, we propose a
novel "Self-Explanation" prompting strategy to enhance the comprehension
abilities of LLMs in multi-turn dialogues. This task-agnostic approach requires
the model to analyze each dialogue utterance before task execution, thereby
improving performance across various dialogue-centric tasks. Experimental
results from six benchmark datasets confirm that our method consistently
outperforms other zero-shot prompts and matches or exceeds the efficacy of
few-shot prompts, demonstrating its potential as a powerful tool in enhancing
LLMs' comprehension in complex dialogue tasks
Can In-context Learners Learn a Reasoning Concept from Demonstrations?
Large language models show an emergent ability to learn a new task from a
small number of input-output demonstrations. However, recent work shows that
in-context learners largely rely on their pre-trained knowledge, such as the
sentiment of the labels, instead of finding new associations in the input.
However, the commonly-used few-shot evaluation settings using a random
selection of in-context demonstrations can not disentangle models' ability to
learn a new skill from demonstrations, as most of the randomly-selected
demonstrations do not present relations informative for prediction beyond
exposing the new task distribution.
To disentangle models' in-context learning ability independent of models'
memory, we introduce a Conceptual few-shot learning method selecting the
demonstrations sharing a possibly-informative concept with the predicted
sample. We extract a set of such concepts from annotated explanations and
measure how much can models benefit from presenting these concepts in few-shot
demonstrations.
We find that smaller models are more sensitive to the presented concepts.
While some of the models are able to benefit from concept-presenting
demonstrations for each assessed concept, we find that none of the assessed
in-context learners can benefit from all presented reasoning concepts
consistently, leaving the in-context concept learning an open challenge
Automatic Chain of Thought Prompting in Large Language Models
Large language models (LLMs) can perform complex reasoning by generating
intermediate reasoning steps. Providing these steps for prompting
demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has
two major paradigms. One leverages a simple prompt like "Let's think step by
step" to facilitate step-by-step thinking before answering a question. The
other uses a few manual demonstrations one by one, each composed of a question
and a reasoning chain that leads to an answer. The superior performance of the
second paradigm hinges on the hand-crafting of task-specific demonstrations one
by one. We show that such manual efforts may be eliminated by leveraging LLMs
with the "Let's think step by step" prompt to generate reasoning chains for
demonstrations one by one, i.e., let's think not just step by step, but also
one by one. However, these generated chains often come with mistakes. To
mitigate the effect of such mistakes, we find that diversity matters for
automatically constructing demonstrations. We propose an automatic CoT
prompting method: Auto-CoT. It samples questions with diversity and generates
reasoning chains to construct demonstrations. On ten public benchmark reasoning
tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of
the CoT paradigm that requires manual designs of demonstrations. Code is
available at https://github.com/amazon-research/auto-co
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Earth vision research typically focuses on extracting geospatial object
locations and categories but neglects the exploration of relations between
objects and comprehensive reasoning. Based on city planning needs, we develop a
multi-modal multi-task VQA dataset (EarthVQA) to advance relational
reasoning-based judging, counting, and comprehensive analysis. The EarthVQA
dataset contains 6000 images, corresponding semantic masks, and 208,593 QA
pairs with urban and rural governance requirements embedded. As objects are the
basis for complex relational reasoning, we propose a Semantic OBject Awareness
framework (SOBA) to advance VQA in an object-centric way. To preserve refined
spatial locations and semantics, SOBA leverages a segmentation network for
object semantics generation. The object-guided attention aggregates object
interior features via pseudo masks, and bidirectional cross-attention further
models object external relations hierarchically. To optimize object counting,
we propose a numerical difference loss that dynamically adds difference
penalties, unifying the classification and regression tasks. Experimental
results show that SOBA outperforms both advanced general and remote sensing
methods. We believe this dataset and framework provide a strong benchmark for
Earth vision's complex analysis. The project page is at
https://Junjue-Wang.github.io/homepage/EarthVQA.Comment: Accepted By AAAI 202