524 research outputs found
Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods
Deep Metric Learning (DML) learns a non-linear semantic embedding from input
data that brings similar pairs together while keeps dissimilar data away from
each other. To this end, many different methods are proposed in the last decade
with promising results in various applications. The success of a DML algorithm
greatly depends on its loss function. However, no loss function is perfect, and
it deals only with some aspects of an optimal similarity embedding. Besides,
the generalizability of the DML on unseen categories during the test stage is
an important matter that is not considered by existing loss functions. To
address these challenges, we propose novel approaches to combine different
losses built on top of a shared deep feature extractor. The proposed ensemble
of losses enforces the deep model to extract features that are consistent with
all losses. Since the selected losses are diverse and each emphasizes different
aspects of an optimal semantic embedding, our effective combining methods yield
a considerable improvement over any individual loss and generalize well on
unseen categories. Here, there is no limitation in choosing loss functions, and
our methods can work with any set of existing ones. Besides, they can optimize
each loss function as well as its weight in an end-to-end paradigm with no need
to adjust any hyper-parameter. We evaluate our methods on some popular datasets
from the machine vision domain in conventional Zero-Shot-Learning (ZSL)
settings. The results are very encouraging and show that our methods outperform
all baseline losses by a large margin in all datasets.Comment: 27 pages, 12 figure
Shared Microexponents: A Little Shifting Goes a Long Way
This paper introduces Block Data Representations (BDR), a framework for
exploring and evaluating a wide spectrum of narrow-precision formats for deep
learning. It enables comparison of popular quantization standards, and through
BDR, new formats based on shared microexponents (MX) are identified, which
outperform other state-of-the-art quantization approaches, including
narrow-precision floating-point and block floating-point. MX utilizes multiple
levels of quantization scaling with ultra-fine scaling factors based on shared
microexponents in the hardware. The effectiveness of MX is demonstrated on
real-world models including large-scale generative pretraining and inferencing,
and production-scale recommendation systems
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
The rising popularity of intelligent mobile devices and the daunting
computational cost of deep learning-based models call for efficient and
accurate on-device inference schemes. We propose a quantization scheme that
allows inference to be carried out using integer-only arithmetic, which can be
implemented more efficiently than floating point inference on commonly
available integer-only hardware. We also co-design a training procedure to
preserve end-to-end model accuracy post quantization. As a result, the proposed
quantization scheme improves the tradeoff between accuracy and on-device
latency. The improvements are significant even on MobileNets, a model family
known for run-time efficiency, and are demonstrated in ImageNet classification
and COCO detection on popular CPUs.Comment: 14 pages, 12 figure
DeepPermNet: Visual Permutation Learning
We present a principled approach to uncover the structure of visual data by
solving a novel deep learning task coined visual permutation learning. The goal
of this task is to find the permutation that recovers the structure of data
from shuffled versions of it. In the case of natural images, this task boils
down to recovering the original image from patches shuffled by an unknown
permutation matrix. Unfortunately, permutation matrices are discrete, thereby
posing difficulties for gradient-based methods. To this end, we resort to a
continuous approximation of these matrices using doubly-stochastic matrices
which we generate from standard CNN predictions using Sinkhorn iterations.
Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet,
an end-to-end CNN model for this task. The utility of DeepPermNet is
demonstrated on two challenging computer vision problems, namely, (i) relative
attributes learning and (ii) self-supervised representation learning. Our
results show state-of-the-art performance on the Public Figures and OSR
benchmarks for (i) and on the classification and segmentation tasks on the
PASCAL VOC dataset for (ii).Comment: Accepted in IEEE International Conference on Computer Vision and
Pattern Recognition CVPR 201
The Euclidean Space is Evil: Hyperbolic Attribute Editing for Few-shot Image Generation
Few-shot image generation is a challenging task since it aims to generate
diverse new images for an unseen category with only a few images. Existing
methods suffer from the trade-off between the quality and diversity of
generated images. To tackle this problem, we propose Hyperbolic Attribute
Editing (HAE), a simple yet effective method. Unlike other methods that work in
Euclidean space, HAE captures the hierarchy among images using data from seen
categories in hyperbolic space. Given a well-trained HAE, images of unseen
categories can be generated by moving the latent code of a given image toward
any meaningful directions in the Poincar\'e disk with a fixing radius. Most
importantly, the hyperbolic space allows us to control the semantic diversity
of the generated images by setting different radii in the disk. Extensive
experiments and visualizations demonstrate that HAE is capable of not only
generating images with promising quality and diversity using limited data but
achieving a highly controllable and interpretable editing process
Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation
Denoising Diffusion Probabilistic Model (DDPM) is able to make flexible
conditional image generation from prior noise to real data, by introducing an
independent noise-aware classifier to provide conditional gradient guidance at
each time step of denoising process. However, due to the ability of classifier
to easily discriminate an incompletely generated image only with high-level
structure, the gradient, which is a kind of class information guidance, tends
to vanish early, leading to the collapse from conditional generation process
into the unconditional process. To address this problem, we propose two simple
but effective approaches from two perspectives. For sampling procedure, we
introduce the entropy of predicted distribution as the measure of guidance
vanishing level and propose an entropy-aware scaling method to adaptively
recover the conditional semantic guidance. For training stage, we propose the
entropy-aware optimization objectives to alleviate the overconfident prediction
for noisy data.On ImageNet1000 256x256, with our proposed sampling scheme and
trained classifier, the pretrained conditional and unconditional DDPM model can
achieve 10.89% (4.59 to 4.09) and 43.5% (12 to 6.78) FID improvement
respectively. The code is available at https://github.com/ZGCTroy/ED-DPM.Comment: 24 pages, 8 figure
- …