The Expressibility of Polynomial based Attention Scheme

Abstract

Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. As a result, they have influenced and, to some extent, reshaped people's lifestyles. However, the quadratic complexity of attention in transformer architectures poses a challenge when scaling up these models for processing long textual contexts. This issue makes it impractical to train very large models on lengthy texts or use them efficiently during inference. While a recent study by [KMZ23] introduced a technique that replaces the softmax with a polynomial function and polynomial sketching to speed up attention mechanisms, the theoretical understandings of this new approach are not yet well understood. In this paper, we offer a theoretical analysis of the expressive capabilities of polynomial attention. Our study reveals a disparity in the ability of high-degree and low-degree polynomial attention. Specifically, we construct two carefully designed datasets, namely D0\mathcal{D}_0 and D1\mathcal{D}_1, where D1\mathcal{D}_1 includes a feature with a significantly larger value compared to D0\mathcal{D}_0. We demonstrate that with a sufficiently high degree Ξ²\beta, a single-layer polynomial attention network can distinguish between D0\mathcal{D}_0 and D1\mathcal{D}_1. However, with a low degree Ξ²\beta, the network cannot effectively separate the two datasets. This analysis underscores the greater effectiveness of high-degree polynomials in amplifying large values and distinguishing between datasets. Our analysis offers insight into the representational capacity of polynomial attention and provides a rationale for incorporating higher-degree polynomials in attention mechanisms to capture intricate linguistic correlations.Comment: arXiv admin note: substantial text overlap with arXiv:2310.1168

    Similar works

    Full text

    thumbnail-image

    Available Versions