34 research outputs found
DeepPermNet: Visual Permutation Learning
We present a principled approach to uncover the structure of visual data by
solving a novel deep learning task coined visual permutation learning. The goal
of this task is to find the permutation that recovers the structure of data
from shuffled versions of it. In the case of natural images, this task boils
down to recovering the original image from patches shuffled by an unknown
permutation matrix. Unfortunately, permutation matrices are discrete, thereby
posing difficulties for gradient-based methods. To this end, we resort to a
continuous approximation of these matrices using doubly-stochastic matrices
which we generate from standard CNN predictions using Sinkhorn iterations.
Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet,
an end-to-end CNN model for this task. The utility of DeepPermNet is
demonstrated on two challenging computer vision problems, namely, (i) relative
attributes learning and (ii) self-supervised representation learning. Our
results show state-of-the-art performance on the Public Figures and OSR
benchmarks for (i) and on the classification and segmentation tasks on the
PASCAL VOC dataset for (ii).Comment: Accepted in IEEE International Conference on Computer Vision and
Pattern Recognition CVPR 201
Learning Combinatorial Embedding Networks for Deep Graph Matching
Graph matching refers to finding node correspondence between graphs, such
that the corresponding node and edge's affinity can be maximized. In addition
with its NP-completeness nature, another important challenge is effective
modeling of the node-wise and structure-wise affinity across graphs and the
resulting objective, to guide the matching procedure effectively finding the
true matching against noises. To this end, this paper devises an end-to-end
differentiable deep network pipeline to learn the affinity for graph matching.
It involves a supervised permutation loss regarding with node correspondence to
capture the combinatorial nature for graph matching. Meanwhile deep graph
embedding models are adopted to parameterize both intra-graph and cross-graph
affinity functions, instead of the traditional shallow and simple parametric
forms e.g. a Gaussian kernel. The embedding can also effectively capture the
higher-order structure beyond second-order edges. The permutation loss model is
agnostic to the number of nodes, and the embedding model is shared among nodes
such that the network allows for varying numbers of nodes in graphs for
training and inference. Moreover, our network is class-agnostic with some
generalization capability across different categories. All these features are
welcomed for real-world applications. Experiments show its superiority against
state-of-the-art graph matching learning methods.Comment: ICCV2019 oral. Code available at
https://github.com/Thinklab-SJTU/PCA-G
Domain Generalization by Solving Jigsaw Puzzles
Human adaptability relies crucially on the ability to learn and merge
knowledge both from supervised and unsupervised learning: the parents point out
few important concepts, but then the children fill in the gaps on their own.
This is particularly effective, because supervised learning can never be
exhaustive and thus learning autonomously allows to discover invariances and
regularities that help to generalize. In this paper we propose to apply a
similar approach to the task of object recognition across domains: our model
learns the semantic labels in a supervised fashion, and broadens its
understanding of the data by learning from self-supervised signals how to solve
a jigsaw puzzle on the same images. This secondary task helps the network to
learn the concepts of spatial correlation while acting as a regularizer for the
classification task. Multiple experiments on the PACS, VLCS, Office-Home and
digits datasets confirm our intuition and show that this simple method
outperforms previous domain generalization and adaptation solutions. An
ablation study further illustrates the inner workings of our approach.Comment: Accepted at CVPR 2019 (oral
Learning Visual Representations for Transfer Learning by Suppressing Texture
Recent literature has shown that features obtained from supervised training
of CNNs may over-emphasize texture rather than encoding high-level information.
In self-supervised learning in particular, texture as a low-level cue may
provide shortcuts that prevent the network from learning higher level
representations. To address these problems we propose to use classic methods
based on anisotropic diffusion to augment training using images with suppressed
texture. This simple method helps retain important edge information and
suppress texture at the same time. We empirically show that our method achieves
state-of-the-art results on object detection and image classification with
eight diverse datasets in either supervised or self-supervised learning tasks
such as MoCoV2 and Jigsaw. Our method is particularly effective for transfer
learning tasks and we observed improved performance on five standard transfer
learning datasets. The large improvements (up to 11.49\%) on the
Sketch-ImageNet dataset, DTD dataset and additional visual analyses with
saliency maps suggest that our approach helps in learning better
representations that better transfer
GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a generative mental image
Puzzle solving is a combinatorial challenge due to the difficulty of matching
adjacent pieces. Instead, we infer a mental image from all pieces, which a
given piece can then be matched against avoiding the combinatorial explosion.
Exploiting advancements in Generative Adversarial methods, we learn how to
reconstruct the image given a set of unordered pieces, allowing the model to
learn a joint embedding space to match an encoding of each piece to the cropped
layer of the generator. Therefore we frame the problem as a R@1 retrieval task,
and then solve the linear assignment using differentiable Hungarian attention,
making the process end-to-end. In doing so our model is puzzle size agnostic,
in contrast to prior deep learning methods which are single size. We evaluate
on two new large-scale datasets, where our model is on par with deep learning
methods, while generalizing to multiple puzzle sizes.Comment: Accepted at International Conference of Image Processing (ICIP22
ADEPT: Automatic Differentiable DEsign of Photonic Tensor Cores
Photonic tensor cores (PTCs) are essential building blocks for optical
artificial intelligence (AI) accelerators based on programmable photonic
integrated circuits. PTCs can achieve ultra-fast and efficient tensor
operations for neural network (NN) acceleration. Current PTC designs are either
manually constructed or based on matrix decomposition theory, which lacks the
adaptability to meet various hardware constraints and device specifications. To
our best knowledge, automatic PTC design methodology is still unexplored. It
will be promising to move beyond the manual design paradigm and "nurture"
photonic neurocomputing with AI and design automation. Therefore, in this work,
for the first time, we propose a fully differentiable framework, dubbed ADEPT,
that can efficiently search PTC designs adaptive to various circuit footprint
constraints and foundry PDKs. Extensive experiments show superior flexibility
and effectiveness of the proposed ADEPT framework to explore a large PTC design
space. On various NN models and benchmarks, our searched PTC topology
outperforms prior manually-designed structures with competitive matrix
representability, 2-30x higher footprint compactness, and better noise
robustness, demonstrating a new paradigm in photonic neural chip design. The
code of ADEPT is available at https://github.com/JeremieMelo/ADEPT using the
https://github.com/JeremieMelo/pytorch-onn (TorchONN) library.Comment: Accepted to ACM/IEEE Design Automation Conference (DAC), 202