Ever since the development of GPT-3 in the natural language processing (NLP)
field, in-context learning (ICL) has played an important role in utilizing
large language models (LLMs). By presenting the LM utterance-label
demonstrations at the input, the LM can accomplish few-shot learning without
relying on gradient descent or requiring explicit modification of its
parameters. This enables the LM to learn and adapt in a black-box manner.
Despite the success of ICL in NLP, little work is exploring the possibility of
ICL in speech processing. This study proposes the first exploration of ICL with
a speech LM without text supervision. We first show that the current speech LM
does not have the ICL capability. With the proposed warmup training, the speech
LM can, therefore, perform ICL on unseen tasks. In this work, we verify the
feasibility of ICL for speech LM on speech classification tasks.Comment: The first two authors contributed equall