16,607 research outputs found
Data-Side Efficiencies for Lightweight Convolutional Neural Networks
We examine how the choice of data-side attributes for two important visual
tasks of image classification and object detection can aid in the choice or
design of lightweight convolutional neural networks. We show by experimentation
how four data attributes - number of classes, object color, image resolution,
and object scale affect neural network model size and efficiency. Intra- and
inter-class similarity metrics, based on metric learning, are defined to guide
the evaluation of these attributes toward achieving lightweight models.
Evaluations made using these metrics are shown to require 30x less computation
than running full inference tests. We provide, as an example, applying the
metrics and methods to choose a lightweight model for a robot path planning
application and achieve computation reduction of 66% and accuracy gain of 3.5%
over the pre-method model.Comment: 10 pages, 5 figures, 6 table
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Vision-language pre-training like CLIP has shown promising performance on
various downstream tasks such as zero-shot image classification and image-text
retrieval. Most of the existing CLIP-alike works usually adopt relatively large
image encoders like ResNet50 and ViT, while the lightweight counterparts are
rarely discussed. In this paper, we propose a multi-level interaction paradigm
for training lightweight CLIP models. Firstly, to mitigate the problem that
some image-text pairs are not strictly one-to-one correspondence, we improve
the conventional global instance-level alignment objective by softening the
label of negative samples progressively. Secondly, a relaxed bipartite matching
based token-level alignment objective is introduced for finer-grained alignment
between image patches and textual words. Moreover, based on the observation
that the accuracy of CLIP model does not increase correspondingly as the
parameters of text encoder increase, an extra objective of masked language
modeling (MLM) is leveraged for maximizing the potential of the shortened text
encoder. In practice, an auxiliary fusion module injecting unmasked image
embedding into masked text embedding at different network stages is proposed
for enhancing the MLM. Extensive experiments show that without introducing
additional computational cost during inference, the proposed method achieves a
higher performance on multiple downstream tasks
Lightweight Probabilistic Deep Networks
Even though probabilistic treatments of neural networks have a long history,
they have not found widespread use in practice. Sampling approaches are often
too slow already for simple networks. The size of the inputs and the depth of
typical CNN architectures in computer vision only compound this problem.
Uncertainty in neural networks has thus been largely ignored in practice,
despite the fact that it may provide important information about the
reliability of predictions and the inner workings of the network. In this
paper, we introduce two lightweight approaches to making supervised learning
with probabilistic deep networks practical: First, we suggest probabilistic
output layers for classification and regression that require only minimal
changes to existing networks. Second, we employ assumed density filtering and
show that activation uncertainties can be propagated in a practical fashion
through the entire network, again with minor changes. Both probabilistic
networks retain the predictive power of the deterministic counterpart, but
yield uncertainties that correlate well with the empirical error induced by
their predictions. Moreover, the robustness to adversarial examples is
significantly increased.Comment: To appear at CVPR 201
- …