Prompt tuning (PT), where a small amount of trainable soft (continuous)
prompt vectors is affixed to the input of language models (LM), has shown
promising results across various tasks and models for parameter-efficient
fine-tuning (PEFT). PT stands out from other PEFT approaches because it
maintains competitive performance with fewer trainable parameters and does not
drastically scale up its parameters as the model size expands. However, PT
introduces additional soft prompt tokens, leading to longer input sequences,
which significantly impacts training and inference time and memory usage due to
the Transformer's quadratic complexity. Particularly concerning for Large
Language Models (LLMs) that face heavy daily querying. To address this issue,
we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt
into a shorter soft prompt and a pair of low-rank matrices that are then
optimised with two different learning rates. This allows DePT to achieve better
performance while saving over 20% memory and time costs compared to vanilla PT
and its variants, without changing trainable parameter sizes. Through extensive
experiments on 23 natural language processing (NLP) and vision-language (VL)
tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches,
including the full fine-tuning baseline in some scenarios. Additionally, we
empirically show that DEPT grows more efficient as the model size increases.
Our further study reveals that DePT integrates seamlessly with
parameter-efficient transfer learning in the few-shot learning setting and
highlights its adaptability to various model architectures and sizes.Comment: Code is available at https://github.com/ZhengxiangShi/DeP