Large language models have become ubiquitous in modern life, finding
applications in various domains such as natural language processing, language
translation, and speech recognition. Recently, a breakthrough work [Zhao,
Panigrahi, Ge, and Arora Arxiv 2023] explains the attention model from
probabilistic context-free grammar (PCFG). One of the central computation task
for computing probability in PCFG is formulating a particular tensor low rank
approximation problem, we can call it tensor cycle rank. Given an nΓnΓn third order tensor A, we say that A has cycle rank-k if there
exists three nΓk2 size matrices U,V, and W such that for each
entry in each \begin{align*} A_{a,b,c} = \sum_{i=1}^k \sum_{j=1}^k \sum_{l=1}^k
U_{a,i+k(j-1)} \otimes V_{b, j + k(l-1)} \otimes W_{c, l + k(i-1) }
\end{align*} for all aβ[n],bβ[n],cβ[n]. For the tensor
classical rank, tucker rank and train rank, it has been well studied in [Song,
Woodruff, Zhong SODA 2019]. In this paper, we generalize the previous
``rotation and sketch'' technique in page 186 of [Song, Woodruff, Zhong SODA
2019] and show an input sparsity time algorithm for cycle rank