18 research outputs found
Semi-supervised Transfer Learning for Image Rain Removal
Single image rain removal is a typical inverse problem in computer vision.
The deep learning technique has been verified to be effective for this task and
achieved state-of-the-art performance. However, previous deep learning methods
need to pre-collect a large set of image pairs with/without synthesized rain
for training, which tends to make the neural network be biased toward learning
the specific patterns of the synthesized rain, while be less able to generalize
to real test samples whose rain types differ from those in the training data.
To this issue, this paper firstly proposes a semi-supervised learning paradigm
toward this task. Different from traditional deep learning methods which only
use supervised image pairs with/without synthesized rain, we further put real
rainy images, without need of their clean ones, into the network training
process. This is realized by elaborately formulating the residual between an
input rainy image and its expected network output (clear image without rain) as
a specific parametrized rain streaks distribution. The network is therefore
trained to adapt real unsupervised diverse rain types through transferring from
the supervised synthesized rain, and thus both the short-of-training-sample and
bias-to-supervised-sample issues can be evidently alleviated. Experiments on
synthetic and real data verify the superiority of our model compared to the
state-of-the-arts.Comment: 10 page
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go
Adverse conditions typically suffer from stochastic hybrid weather
degradations (e.g., rainy and hazy night), while existing image restoration
algorithms envisage that weather degradations occur independently, thus may
fail to handle real-world complicated scenarios. Besides, supervised training
is not feasible due to the lack of a comprehensive paired dataset to
characterize hybrid conditions. To this end, we have advanced the
aforementioned limitations with two tactics: framework and data. First, we
present a novel unified framework, dubbed RAHC, to Restore Arbitrary Hybrid
adverse weather Conditions in one go. Specifically, our RAHC leverages a
multi-head aggregation architecture to learn multiple degradation
representation subspaces and then constrains the network to flexibly handle
multiple hybrid adverse weather in a unified paradigm through a discrimination
mechanism in the output space. Furthermore, we devise a reconstruction vectors
aided scheme to provide auxiliary visual content cues for reconstruction, thus
can comfortably cope with hybrid scenarios with insufficient remaining image
constituents. Second, we construct a new dataset, termed HAC, for learning and
benchmarking arbitrary Hybrid Adverse Conditions restoration. HAC contains 31
scenarios composed of an arbitrary combination of five common weather, with a
total of ~316K adverse-weather/clean pairs. Extensive experiments yield
superior results and establish new state-of-the-art results on both HAC and
conventional datasets.Comment: In submissio
OSIC: A New One-Stage Image Captioner Coined
Mainstream image caption models are usually two-stage captioners, i.e.,
calculating object features by pre-trained detector, and feeding them into a
language model to generate text descriptions. However, such an operation will
cause a task-based information gap to decrease the performance, since the
object features in detection task are suboptimal representation and cannot
provide all necessary information for subsequent text generation. Besides,
object features are usually represented by the last layer features that lose
the local details of input images. In this paper, we propose a novel One-Stage
Image Captioner (OSIC) with dynamic multi-sight learning, which directly
transforms input image into descriptive sentences in one stage. As a result,
the task-based information gap can be greatly reduced. To obtain rich features,
we use the Swin Transformer to calculate multi-level features, and then feed
them into a novel dynamic multi-sight embedding module to exploit both global
structure and local texture of input images. To enhance the global modeling of
encoder for caption, we propose a new dual-dimensional refining module to
non-locally model the interaction of the embedded features. Finally, OSIC can
obtain rich and useful information to improve the image caption task. Extensive
comparisons on benchmark MS-COCO dataset verified the superior performance of
our method
A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations
Modern deep neural networks, particularly recent large language models, come
with massive model sizes that require significant computational and storage
resources. To enable the deployment of modern models on resource-constrained
environments and accelerate inference time, researchers have increasingly
explored pruning techniques as a popular research direction in neural network
compression. However, there is a dearth of up-to-date comprehensive review
papers on pruning. To address this issue, in this survey, we provide a
comprehensive review of existing research works on deep neural network pruning
in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to
prune, and 4) fusion of pruning and other compression techniques. We then
provide a thorough comparative analysis of seven pairs of contrast settings for
pruning (e.g., unstructured/structured) and explore emerging topics, including
post-training pruning, different levels of supervision for pruning, and broader
applications (e.g., adversarial robustness) to shed light on the commonalities
and differences of existing methods and lay the foundation for further method
development. To facilitate future research, we build a curated collection of
datasets, networks, and evaluations on different applications. Finally, we
provide some valuable recommendations on selecting pruning methods and prospect
promising research directions. We build a repository at
https://github.com/hrcheng1066/awesome-pruning
Salient Object Detection Techniques in Computer Vision-A Survey.
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end