3 research outputs found
SWIFT: Scalable Wasserstein Factorization for Sparse Nonnegative Tensors
Existing tensor factorization methods assume that the input tensor follows
some specific distribution (i.e. Poisson, Bernoulli, and Gaussian), and solve
the factorization by minimizing some empirical loss functions defined based on
the corresponding distribution. However, it suffers from several drawbacks: 1)
In reality, the underlying distributions are complicated and unknown, making it
infeasible to be approximated by a simple distribution. 2) The correlation
across dimensions of the input tensor is not well utilized, leading to
sub-optimal performance. Although heuristics were proposed to incorporate such
correlation as side information under Gaussian distribution, they can not
easily be generalized to other distributions. Thus, a more principled way of
utilizing the correlation in tensor factorization models is still an open
challenge. Without assuming any explicit distribution, we formulate the tensor
factorization as an optimal transport problem with Wasserstein distance, which
can handle non-negative inputs.
We introduce SWIFT, which minimizes the Wasserstein distance that measures
the distance between the input tensor and that of the reconstruction. In
particular, we define the N-th order tensor Wasserstein loss for the widely
used tensor CP factorization and derive the optimization algorithm that
minimizes it. By leveraging sparsity structure and different equivalent
formulations for optimizing computational efficiency, SWIFT is as scalable as
other well-known CP algorithms. Using the factor matrices as features, SWIFT
achieves up to 9.65% and 11.31% relative improvement over baselines for
downstream prediction tasks. Under the noisy conditions, SWIFT achieves up to
15% and 17% relative improvements over the best competitors for the prediction
tasks.Comment: Accepted by AAAI-2