57 research outputs found
Learning Latent Permutations with Gumbel-Sinkhorn Networks
Permutations and matchings are core building blocks in a variety of latent
variable models, as they allow us to align, canonicalize, and sort data.
Learning in such models is difficult, however, because exact marginalization
over these combinatorial objects is intractable. In response, this paper
introduces a collection of new methods for end-to-end learning in such models
that approximate discrete maximum-weight matching using the continuous Sinkhorn
operator. Sinkhorn iteration is attractive because it functions as a simple,
easy-to-implement analog of the softmax operator. With this, we can define the
Gumbel-Sinkhorn method, an extension of the Gumbel-Softmax method (Jang et al.
2016, Maddison2016 et al. 2016) to distributions over latent matchings. We
demonstrate the effectiveness of our method by outperforming competitive
baselines on a range of qualitatively different tasks: sorting numbers, solving
jigsaw puzzles, and identifying neural signals in worms
Modeling Orders of User Behaviors via Differentiable Sorting: A Multi-task Framework to Predicting User Post-click Conversion
User post-click conversion prediction is of high interest to researchers and
developers. Recent studies employ multi-task learning to tackle the selection
bias and data sparsity problem, two severe challenges in post-click behavior
prediction, by incorporating click data. However, prior works mainly focused on
pointwise learning and the orders of labels (i.e., click and post-click) are
not well explored, which naturally poses a listwise learning problem. Inspired
by recent advances on differentiable sorting, in this paper, we propose a novel
multi-task framework that leverages orders of user behaviors to predict user
post-click conversion in an end-to-end approach. Specifically, we define an
aggregation operator to combine predicted outputs of different tasks to a
unified score, then we use the computed scores to model the label relations via
differentiable sorting. Extensive experiments on public and industrial datasets
show the superiority of our proposed model against competitive baselines.Comment: The paper is accepted as a short research paper by SIGIR 202
Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval
A recent trend in multimodal retrieval is related to postprocessing test set
results via the dual-softmax loss (DSL). While this approach can bring
significant improvements, it usually presumes that an entire matrix of test
samples is available as DSL input. This work introduces a new postprocessing
approach based on Sinkhorn transformations that outperforms DSL. Further, we
propose a new postprocessing setting that does not require access to multiple
test queries. We show that our approach can significantly improve the results
of state of the art models such as CLIP4Clip, BLIP, X-CLIP, and DRL, thus
achieving a new state-of-the-art on several standard text-video retrieval
datasets both with access to the entire test set and in the single-query
setting.Comment: SIGIR 202
Sinkhorn-Flow: Predicting Probability Mass Flow in Dynamical Systems Using Optimal Transport
Predicting how distributions over discrete variables vary over time is a
common task in time series forecasting. But whereas most approaches focus on
merely predicting the distribution at subsequent time steps, a crucial piece of
information in many settings is to determine how this probability mass flows
between the different elements over time. We propose a new approach to
predicting such mass flow over time using optimal transport. Specifically, we
propose a generic approach to predicting transport matrices in end-to-end deep
learning systems, replacing the standard softmax operation with Sinkhorn
iterations. We apply our approach to the task of predicting how communities
will evolve over time in social network settings, and show that the approach
improves substantially over alternative prediction methods. We specifically
highlight results on the task of predicting faction evolution in Ukrainian
parliamentary voting.Comment: A prior version of the work appeared in the Optimal Transport
Workshop at NeurIPS 201
Second-order Democratic Aggregation
Aggregated second-order features extracted from deep convolutional networks
have been shown to be effective for texture generation, fine-grained
recognition, material classification, and scene understanding. In this paper,
we study a class of orderless aggregation functions designed to minimize
interference or equalize contributions in the context of second-order features
and we show that they can be computed just as efficiently as their first-order
counterparts and they have favorable properties over aggregation by summation.
Another line of work has shown that matrix power normalization after
aggregation can significantly improve the generalization of second-order
representations. We show that matrix power normalization implicitly equalizes
contributions during aggregation thus establishing a connection between matrix
normalization techniques and prior work on minimizing interference. Based on
the analysis we present {\gamma}-democratic aggregators that interpolate
between sum ({\gamma}=1) and democratic pooling ({\gamma}=0) outperforming both
on several classification tasks. Moreover, unlike power normalization, the
{\gamma}-democratic aggregations can be computed in a low dimensional space by
sketching that allows the use of very high-dimensional second-order features.
This results in a state-of-the-art performance on several datasets
- …