4 research outputs found
Calc-X: Enriching Arithmetical Chain-of-Thoughts Datasets by Interaction with Symbolic Systems
This report overviews our ongoing work in enriching chain-of-thoughts
datasets requiring arithmetical reasoning with the integration of
non-parametric components, such as a calculator. We conduct an analysis of
prominent relevant datasets such as GSM8K, Ape210K, AQuA-RAT, and MathQA and
propose a machine-processable HTML-like format specifically tailored for
working with semi-structured chains. By converting the datasets into this
unified format, we enable the effective integration of large language models
and symbolic systems, empowering them to tackle arithmetical reasoning tasks
more efficiently
Concept-aware Training Improves In-context Learning Ability of Language Models
Many recent language models (LMs) of Transformers family exhibit so-called
in-context learning (ICL) ability, manifested in the LMs' ability to modulate
their function by a task described in a natural language input. Previous work
curating these models assumes that ICL emerges from vast over-parametrization
or the scale of multi-task training. However, a complementary branch of recent
theoretical work attributes ICL emergence to specific properties of training
data and creates functional in-context learners in small-scale, synthetic
settings.
Inspired by recent findings on data properties driving the emergence of ICL,
we propose a method to create LMs able to better utilize the in-context
information, by constructing training scenarios where it is beneficial for the
LM to capture the analogical reasoning concepts. We measure that data sampling
of Concept-aware Training (CoAT) consistently improves models' reasoning
ability. As a result, the in-context learners trained with CoAT on only two
datasets of a single (QA) task perform comparably to larger models trained on
1600+ tasks.Comment: Work in progres
Can In-context Learners Learn a Reasoning Concept from Demonstrations?
Large language models show an emergent ability to learn a new task from a
small number of input-output demonstrations. However, recent work shows that
in-context learners largely rely on their pre-trained knowledge, such as the
sentiment of the labels, instead of finding new associations in the input.
However, the commonly-used few-shot evaluation settings using a random
selection of in-context demonstrations can not disentangle models' ability to
learn a new skill from demonstrations, as most of the randomly-selected
demonstrations do not present relations informative for prediction beyond
exposing the new task distribution.
To disentangle models' in-context learning ability independent of models'
memory, we introduce a Conceptual few-shot learning method selecting the
demonstrations sharing a possibly-informative concept with the predicted
sample. We extract a set of such concepts from annotated explanations and
measure how much can models benefit from presenting these concepts in few-shot
demonstrations.
We find that smaller models are more sensitive to the presented concepts.
While some of the models are able to benefit from concept-presenting
demonstrations for each assessed concept, we find that none of the assessed
in-context learners can benefit from all presented reasoning concepts
consistently, leaving the in-context concept learning an open challenge
A Whisper transformer for audio captioning trained with synthetic captions and transfer learning
The field of audio captioning has seen significant advancements in recent
years, driven by the availability of large-scale audio datasets and
advancements in deep learning techniques. In this technical report, we present
our approach to audio captioning, focusing on the use of a pretrained
speech-to-text Whisper model and pretraining on synthetic captions. We discuss
our training procedures and present our experiments' results, which include
model size variations, dataset mixtures, and other hyperparameters. Our
findings demonstrate the impact of different training strategies on the
performance of the audio captioning model. Our code and trained models are
publicly available on GitHub and Hugging Face Hub