Applications of optimal transport have recently gained remarkable attention
thanks to the computational advantages of entropic regularization. However, in
most situations the Sinkhorn approximation of the Wasserstein distance is
replaced by a regularized version that is less accurate but easy to
differentiate. In this work we characterize the differential properties of the
original Sinkhorn distance, proving that it enjoys the same smoothness as its
regularized version and we explicitly provide an efficient algorithm to compute
its gradient. We show that this result benefits both theory and applications:
on one hand, high order smoothness confers statistical guarantees to learning
with Wasserstein approximations. On the other hand, the gradient formula allows
us to efficiently solve learning and optimization problems in practice.
Promising preliminary experiments complement our analysis.Comment: 26 pages, 4 figure