We present a technique for significantly speeding up Alternating Least
Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor
factorization. By exploiting properties of the Khatri-Rao product, we show how
to efficiently address a computationally challenging sub-step of both
algorithms. Our algorithm, DFacTo, only requires two sparse matrix-vector
products and is easy to parallelize. DFacTo is not only scalable but also on
average 4 to 10 times faster than competing algorithms on a variety of
datasets. For instance, DFacTo only takes 480 seconds on 4 machines to perform
one iteration of the ALS algorithm and 1,143 seconds to perform one iteration
of the GD algorithm on a 6.5 million x 2.5 million x 1.5 million dimensional
tensor with 1.2 billion non-zero entries.Comment: Under review for NIPS 201