How can we efficiently and accurately analyze an irregular tensor in a
dual-way streaming setting where the sizes of two dimensions of the tensor
increase over time? What types of anomalies are there in the dual-way streaming
setting? An irregular tensor is a collection of matrices whose column lengths
are the same while their row lengths are different. In a dual-way streaming
setting, both new rows of existing matrices and new matrices arrive over time.
PARAFAC2 decomposition is a crucial tool for analyzing irregular tensors.
Although real-time analysis is necessary in the dual-way streaming, static
PARAFAC2 decomposition methods fail to efficiently work in this setting since
they perform PARAFAC2 decomposition for accumulated tensors whenever new data
arrive. Existing streaming PARAFAC2 decomposition methods work in a limited
setting and fail to handle new rows of matrices efficiently. In this paper, we
propose Dash, an efficient and accurate PARAFAC2 decomposition method working
in the dual-way streaming setting. When new data are given, Dash efficiently
performs PARAFAC2 decomposition by carefully dividing the terms related to old
and new data and avoiding naive computations involved with old data.
Furthermore, applying a forgetting factor makes Dash follow recent movements.
Extensive experiments show that Dash achieves up to 14.0x faster speed than
existing PARAFAC2 decomposition methods for newly arrived data. We also provide
discoveries for detecting anomalies in real-world datasets, including Subprime
Mortgage Crisis and COVID-19.Comment: 12 pages, accept to The 29th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD) 202