50 research outputs found
LW-ISP: A Lightweight Model with ISP and Deep Learning
The deep learning (DL)-based methods of low-level tasks have many advantages
over the traditional camera in terms of hardware prospects, error accumulation
and imaging effects. Recently, the application of deep learning to replace the
image signal processing (ISP) pipeline has appeared one after another; however,
there is still a long way to go towards real landing. In this paper, we show
the possibility of learning-based method to achieve real-time high-performance
processing in the ISP pipeline. We propose LW-ISP, a novel architecture
designed to implicitly learn the image mapping from RAW data to RGB image.
Based on U-Net architecture, we propose the fine-grained attention module and a
plug-and-play upsampling block suitable for low-level tasks. In particular, we
design a heterogeneous distillation algorithm to distill the implicit features
and reconstruction information of the clean image, so as to guide the learning
of the student model. Our experiments demonstrate that LW-ISP has achieved a
0.38 dB improvement in PSNR compared to the previous best method, while the
model parameters and calculation have been reduced by 23 times and 81 times.
The inference efficiency has been accelerated by at least 15 times. Without
bells and whistles, LW-ISP has achieved quite competitive results in ISP
subtasks including image denoising and enhancement.Comment: 16 PAGES, ACCEPTED AS A CONFERENCE PAPER AT: BMVC 202
Chiplet Actuary: A Quantitative Cost Model and Multi-Chiplet Architecture Exploration
Multi-chip integration is widely recognized as the extension of Moore's Law.
Cost-saving is a frequently mentioned advantage, but previous works rarely
present quantitative demonstrations on the cost superiority of multi-chip
integration over monolithic SoC. In this paper, we build a quantitative cost
model and put forward an analytical method for multi-chip systems based on
three typical multi-chip integration technologies to analyze the cost benefits
from yield improvement, chiplet and package reuse, and heterogeneity. We
re-examine the actual cost of multi-chip systems from various perspectives and
show how to reduce the total cost of the VLSI system through appropriate
multi-chiplet architecture.Comment: Accepted by and to be presented at DAC 202
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Convolutional neural networks have been widely deployed in various
application scenarios. In order to extend the applications' boundaries to some
accuracy-crucial domains, researchers have been investigating approaches to
boost accuracy through either deeper or wider network structures, which brings
with them the exponential increment of the computational and storage cost,
delaying the responding time. In this paper, we propose a general training
framework named self distillation, which notably enhances the performance
(accuracy) of convolutional neural networks through shrinking the size of the
network rather than aggrandizing it. Different from traditional knowledge
distillation - a knowledge transformation methodology among networks, which
forces student neural networks to approximate the softmax layer outputs of
pre-trained teacher neural networks, the proposed self distillation framework
distills knowledge within network itself. The networks are firstly divided into
several sections. Then the knowledge in the deeper portion of the networks is
squeezed into the shallow ones. Experiments further prove the generalization of
the proposed self distillation framework: enhancement of accuracy at average
level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as
maximum. In addition, it can also provide flexibility of depth-wise scalable
inference on resource-limited edge devices.Our codes will be released on github
soon.Comment: 10page
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention
Most feedforward convolutional neural networks spend roughly the same efforts
for each pixel. Yet human visual recognition is an interaction between eye
movements and spatial attention, which we will have several glimpses of an
object in different regions. Inspired by this observation, we propose an
end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the
challenges of high computation and the lack of robustness based on recurrent
downsampled attention mechanism. Specifically, MGNet sequentially selects
task-relevant regions of an image to focus on and then adaptively combines all
collected information for the final prediction. MGNet expresses strong
resistance against adversarial attacks and common corruptions with less
computation. Also, MGNet is inherently more interpretable as it explicitly
informs us where it focuses during each iteration. Our experiments on
ImageNet100 demonstrate the potential of recurrent downsampled attention
mechanisms to improve a single feedforward manner. For example, MGNet improves
4.76% accuracy on average in common corruptions with only 36.9% computational
cost. Moreover, while the baseline incurs an accuracy drop to 7.6%, MGNet
manages to maintain 44.2% accuracy in the same PGD attack strength with
ResNet-50 backbone. Our code is available at
https://github.com/siahuat0727/MGNet.Comment: Accepted at BMVC 202
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast
Geometry and color information provided by the point clouds are both crucial
for 3D scene understanding. Two pieces of information characterize the
different aspects of point clouds, but existing methods lack an elaborate
design for the discrimination and relevance. Hence we explore a 3D
self-supervised paradigm that can better utilize the relations of point cloud
information. Specifically, we propose a universal 3D scene pre-training
framework via Geometry-Color Contrast (Point-GCC), which aligns geometry and
color information using a Siamese network. To take care of actual application
tasks, we design (i) hierarchical supervision with point-level contrast and
reconstruct and object-level contrast based on the novel deep clustering module
to close the gap between pre-training and downstream tasks; (ii)
architecture-agnostic backbone to adapt for various downstream models.
Benefiting from the object-level representation associated with downstream
tasks, Point-GCC can directly evaluate model performance and the result
demonstrates the effectiveness of our methods. Transfer learning results on a
wide range of tasks also show consistent improvements across all datasets.
e.g., new state-of-the-art object detection results on SUN RGB-D and S3DIS
datasets. Codes will be released at https://github.com/Asterisci/Point-GCC
Learn from Unpaired Data for Image Restoration: A Variational Bayes Approach
Collecting paired training data is difficult in practice, but the unpaired
samples broadly exist. Current approaches aim at generating synthesized
training data from the unpaired samples by exploring the relationship between
the corrupted and clean data. This work proposes LUD-VAE, a deep generative
method to learn the joint probability density function from data sampled from
marginal distributions. Our approach is based on a carefully designed
probabilistic graphical model in which the clean and corrupted data domains are
conditionally independent. Using variational inference, we maximize the
evidence lower bound (ELBO) to estimate the joint probability density function.
Furthermore, we show that the ELBO is computable without paired samples under
the inference invariant assumption. This property provides the mathematical
rationale of our approach in the unpaired setting. Finally, we apply our method
to real-world image denoising and super-resolution tasks and train the models
using the synthetic data generated by the LUD-VAE. Experimental results
validate the advantages of our method over other learnable approaches
Contrastive Deep Supervision
The success of deep learning is usually accompanied by the growth in neural
network depth. However, the traditional training method only supervises the
neural network at its last layer and propagates the supervision layer-by-layer,
which leads to hardship in optimizing the intermediate layers. Recently, deep
supervision has been proposed to add auxiliary classifiers to the intermediate
layers of deep neural networks. By optimizing these auxiliary classifiers with
the supervised task loss, the supervision can be applied to the shallow layers
directly. However, deep supervision conflicts with the well-known observation
that the shallow layers learn low-level features instead of task-biased
high-level semantic features. To address this issue, this paper proposes a
novel training framework named Contrastive Deep Supervision, which supervises
the intermediate layers with augmentation-based contrastive learning.
Experimental results on nine popular datasets with eleven models demonstrate
its effects on general image classification, fine-grained image classification
and object detection in supervised learning, semi-supervised learning and
knowledge distillation. Codes have been released in Github.Comment: Accepted in ECCV202