Transformers have shown dominant performance across a range of domains
including language and vision. However, their computational cost grows
quadratically with the sequence length, making their usage prohibitive for
resource-constrained applications. To counter this, our approach is to divide
the whole sequence into segments and apply attention to the individual
segments. We propose a segmented recurrent transformer (SRformer) that combines
segmented (local) attention with recurrent attention. The loss caused by
reducing the attention window length is compensated by aggregating information
across segments with recurrent attention. SRformer leverages Recurrent
Accumulate-and-Fire (RAF) neurons' inherent memory to update the cumulative
product of keys and values. The segmented attention and lightweight RAF neurons
ensure the efficiency of the proposed transformer. Such an approach leads to
models with sequential processing capability at a lower computation/memory
cost. We apply the proposed method to T5 and BART transformers. The modified
models are tested on summarization datasets including CNN-dailymail, XSUM,
ArXiv, and MediaSUM. Notably, using segmented inputs of varied sizes, the
proposed model achieves 6β22% higher ROUGE1 scores than a segmented
transformer and outperforms other recurrent transformer approaches.
Furthermore, compared to full attention, the proposed model reduces the
computational complexity of cross attention by around 40%.Comment: EMNLP 2023 Finding