12,373 research outputs found
Learning with a Wasserstein loss
Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions.In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric
A Quasi-Wasserstein Loss for Learning Graph Neural Networks
When learning graph neural networks (GNNs) in node-level prediction tasks,
most existing loss functions are applied for each node independently, even if
node embeddings and their labels are non-i.i.d. because of their graph
structures. To eliminate such inconsistency, in this study we propose a novel
Quasi-Wasserstein (QW) loss with the help of the optimal transport defined on
graphs, leading to new learning and prediction paradigms of GNNs. In
particular, we design a "Quasi-Wasserstein" distance between the observed
multi-dimensional node labels and their estimations, optimizing the label
transport defined on graph edges. The estimations are parameterized by a GNN in
which the optimal label transport may determine the graph edge weights
optionally. By reformulating the strict constraint of the label transport to a
Bregman divergence-based regularizer, we obtain the proposed Quasi-Wasserstein
loss associated with two efficient solvers learning the GNN together with
optimal label transport. When predicting node labels, our model combines the
output of the GNN with the residual component provided by the optimal label
transport, leading to a new transductive prediction paradigm. Experiments show
that the proposed QW loss applies to various GNNs and helps to improve their
performance in node-level classification and regression tasks
Differentially Private Sliced Wasserstein Distance
Developing machine learning methods that are privacy preserving is today a
central topic of research, with huge practical impacts. Among the numerous ways
to address privacy-preserving learning, we here take the perspective of
computing the divergences between distributions under the Differential Privacy
(DP) framework -- being able to compute divergences between distributions is
pivotal for many machine learning problems, such as learning generative models
or domain adaptation problems. Instead of resorting to the popular
gradient-based sanitization method for DP, we tackle the problem at its roots
by focusing on the Sliced Wasserstein Distance and seamlessly making it
differentially private. Our main contribution is as follows: we analyze the
property of adding a Gaussian perturbation to the intrinsic randomized
mechanism of the Sliced Wasserstein Distance, and we establish the
sensitivityof the resulting differentially private mechanism. One of our
important findings is that this DP mechanism transforms the Sliced Wasserstein
distance into another distance, that we call the Smoothed Sliced Wasserstein
Distance. This new differentially private distribution distance can be plugged
into generative models and domain adaptation algorithms in a transparent way,
and we empirically show that it yields highly competitive performance compared
with gradient-based DP approaches from the literature, with almost no loss in
accuracy for the domain adaptation problems that we consider
Learning Generative Models across Incomparable Spaces
Generative Adversarial Networks have shown remarkable success in learning a
distribution that faithfully recovers a reference distribution in its entirety.
However, in some cases, we may want to only learn some aspects (e.g., cluster
or manifold structure), while modifying others (e.g., style, orientation or
dimension). In this work, we propose an approach to learn generative models
across such incomparable spaces, and demonstrate how to steer the learned
distribution towards target properties. A key component of our model is the
Gromov-Wasserstein distance, a notion of discrepancy that compares
distributions relationally rather than absolutely. While this framework
subsumes current generative models in identically reproducing distributions,
its inherent flexibility allows application to tasks in manifold learning,
relational learning and cross-domain learning.Comment: International Conference on Machine Learning (ICML
A Principled Approach for Learning Task Similarity in Multitask Learning
Multitask learning aims at solving a set of related tasks simultaneously, by
exploiting the shared knowledge for improving the performance on individual
tasks. Hence, an important aspect of multitask learning is to understand the
similarities within a set of tasks. Previous works have incorporated this
similarity information explicitly (e.g., weighted loss for each task) or
implicitly (e.g., adversarial loss for feature adaptation), for achieving good
empirical performances. However, the theoretical motivations for adding task
similarity knowledge are often missing or incomplete. In this paper, we give a
different perspective from a theoretical point of view to understand this
practice. We first provide an upper bound on the generalization error of
multitask learning, showing the benefit of explicit and implicit task
similarity knowledge. We systematically derive the bounds based on two distinct
task similarity metrics: H divergence and Wasserstein distance. From these
theoretical results, we revisit the Adversarial Multi-task Neural Network,
proposing a new training algorithm to learn the task relation coefficients and
neural network parameters iteratively. We assess our new algorithm empirically
on several benchmarks, showing not only that we find interesting and robust
task relations, but that the proposed approach outperforms the baselines,
reaffirming the benefits of theoretical insight in algorithm design
- …