1,588 research outputs found
Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)
Deep neural networks (DNN) have shown remarkable success in a variety of
machine learning applications. The capacity of these models (i.e., number of
parameters), endows them with expressive power and allows them to reach the
desired performance. In recent years, there is an increasing interest in
deploying DNNs to resource-constrained devices (i.e., mobile devices) with
limited energy, memory, and computational budget. To address this problem, we
propose Entropy-Constrained Trained Ternarization (EC2T), a general framework
to create sparse and ternary neural networks which are efficient in terms of
storage (e.g., at most two binary-masks and two full-precision values are
required to save a weight matrix) and computation (e.g., MAC operations are
reduced to a few accumulations plus two multiplications). This approach
consists of two steps. First, a super-network is created by scaling the
dimensions of a pre-trained model (i.e., its width and depth). Subsequently,
this super-network is simultaneously pruned (using an entropy constraint) and
quantized (that is, ternary values are assigned layer-wise) in a training
process, resulting in a sparse and ternary network representation. We validate
the proposed approach in CIFAR-10, CIFAR-100, and ImageNet datasets, showing
its effectiveness in image classification tasks.Comment: Proceedings of the CVPR'20 Joint Workshop on Efficient Deep Learning
in Computer Vision. Code is available at
https://github.com/d-becking/efficientCNN
Basic Binary Convolution Unit for Binarized Image Restoration Network
Lighter and faster image restoration (IR) models are crucial for the
deployment on resource-limited devices. Binary neural network (BNN), one of the
most promising model compression methods, can dramatically reduce the
computations and parameters of full-precision convolutional neural networks
(CNN). However, there are different properties between BNN and full-precision
CNN, and we can hardly use the experience of designing CNN to develop BNN. In
this study, we reconsider components in binary convolution, such as residual
connection, BatchNorm, activation function, and structure, for IR tasks. We
conduct systematic analyses to explain each component's role in binary
convolution and discuss the pitfalls. Specifically, we find that residual
connection can reduce the information loss caused by binarization; BatchNorm
can solve the value range gap between residual connection and binary
convolution; The position of the activation function dramatically affects the
performance of BNN. Based on our findings and analyses, we design a simple yet
efficient basic binary convolution unit (BBCU). Furthermore, we divide IR
networks into four parts and specially design variants of BBCU for each part to
explore the benefit of binarizing these parts. We conduct experiments on
different IR tasks, and our BBCU significantly outperforms other BNNs and
lightweight models, which shows that BBCU can serve as a basic unit for
binarized IR networks. All codes and models will be released.Comment: ICLR2023, code is available at https://github.com/Zj-BinXia/BBC
xUnit: Learning a Spatial Activation Function for Efficient Image Restoration
In recent years, deep neural networks (DNNs) achieved unprecedented
performance in many low-level vision tasks. However, state-of-the-art results
are typically achieved by very deep networks, which can reach tens of layers
with tens of millions of parameters. To make DNNs implementable on platforms
with limited resources, it is necessary to weaken the tradeoff between
performance and efficiency. In this paper, we propose a new activation unit,
which is particularly suitable for image restoration problems. In contrast to
the widespread per-pixel activation units, like ReLUs and sigmoids, our unit
implements a learnable nonlinear function with spatial connections. This
enables the net to capture much more complex features, thus requiring a
significantly smaller number of layers in order to reach the same performance.
We illustrate the effectiveness of our units through experiments with
state-of-the-art nets for denoising, de-raining, and super resolution, which
are already considered to be very small. With our approach, we are able to
further reduce these models by nearly 50% without incurring any degradation in
performance.Comment: Conference on Computer Vision and Pattern Recognition (CVPR), 201
Binarized Spectral Compressive Imaging
Existing deep learning models for hyperspectral image (HSI) reconstruction
achieve good performance but require powerful hardwares with enormous memory
and computational resources. Consequently, these methods can hardly be deployed
on resource-limited mobile devices. In this paper, we propose a novel method,
Binarized Spectral-Redistribution Network (BiSRNet), for efficient and
practical HSI restoration from compressed measurement in snapshot compressive
imaging (SCI) systems. Firstly, we redesign a compact and easy-to-deploy base
model to be binarized. Then we present the basic unit, Binarized
Spectral-Redistribution Convolution (BiSR-Conv). BiSR-Conv can adaptively
redistribute the HSI representations before binarizing activation and uses a
scalable hyperbolic tangent function to closer approximate the Sign function in
backpropagation. Based on our BiSR-Conv, we customize four binarized
convolutional modules to address the dimension mismatch and propagate
full-precision information throughout the whole network. Finally, our BiSRNet
is derived by using the proposed techniques to binarize the base model.
Comprehensive quantitative and qualitative experiments manifest that our
proposed BiSRNet outperforms state-of-the-art binarization methods and achieves
comparable performance with full-precision algorithms. Code and models are
publicly available at https://github.com/caiyuanhao1998/BiSCI and
https://github.com/caiyuanhao1998/MSTComment: NeurIPS 2023; The first work to study binarized spectral compressive
imaging reconstruction proble
STEFANN: Scene Text Editor using Font Adaptive Neural Network
Textual information in a captured scene plays an important role in scene
interpretation and decision making. Though there exist methods that can
successfully detect and interpret complex text regions present in a scene, to
the best of our knowledge, there is no significant prior work that aims to
modify the textual information in an image. The ability to edit text directly
on images has several advantages including error correction, text restoration
and image reusability. In this paper, we propose a method to modify text in an
image at character-level. We approach the problem in two stages. At first, the
unobserved character (target) is generated from an observed character (source)
being modified. We propose two different neural network architectures - (a)
FANnet to achieve structural consistency with source font and (b) Colornet to
preserve source color. Next, we replace the source character with the generated
character maintaining both geometric and visual consistency with neighboring
characters. Our method works as a unified platform for modifying text in
images. We present the effectiveness of our method on COCO-Text and ICDAR
datasets both qualitatively and quantitatively.Comment: Accepted in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 202
PBGen: Partial Binarization of Deconvolution-Based Generators for Edge Intelligence
This work explores the binarization of the deconvolution-based generator in a
GAN for memory saving and speedup of image construction. Our study suggests
that different from convolutional neural networks (including the discriminator)
where all layers can be binarized, only some of the layers in the generator can
be binarized without significant performance loss. Supported by theoretical
analysis and verified by experiments, a direct metric based on the dimension of
deconvolution operations is established, which can be used to quickly decide
which layers in the generator can be binarized. Our results also indicate that
both the generator and the discriminator should be binarized simultaneously for
balanced competition and better performance. Experimental results based on
CelebA suggest that directly applying state-of-the-art binarization techniques
to all the layers of the generator will lead to 2.83 performance loss
measured by sliced Wasserstein distance compared with the original generator,
while applying them to selected layers only can yield up to 25.81
saving in memory consumption, and 1.96 and 1.32 speedup in
inference and training respectively with little performance loss.Comment: 17 pages, paper re-organized
- …