18 research outputs found

    Semi-supervised Transfer Learning for Image Rain Removal

    Full text link
    Single image rain removal is a typical inverse problem in computer vision. The deep learning technique has been verified to be effective for this task and achieved state-of-the-art performance. However, previous deep learning methods need to pre-collect a large set of image pairs with/without synthesized rain for training, which tends to make the neural network be biased toward learning the specific patterns of the synthesized rain, while be less able to generalize to real test samples whose rain types differ from those in the training data. To this issue, this paper firstly proposes a semi-supervised learning paradigm toward this task. Different from traditional deep learning methods which only use supervised image pairs with/without synthesized rain, we further put real rainy images, without need of their clean ones, into the network training process. This is realized by elaborately formulating the residual between an input rainy image and its expected network output (clear image without rain) as a specific parametrized rain streaks distribution. The network is therefore trained to adapt real unsupervised diverse rain types through transferring from the supervised synthesized rain, and thus both the short-of-training-sample and bias-to-supervised-sample issues can be evidently alleviated. Experiments on synthetic and real data verify the superiority of our model compared to the state-of-the-arts.Comment: 10 page

    Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go

    Full text link
    Adverse conditions typically suffer from stochastic hybrid weather degradations (e.g., rainy and hazy night), while existing image restoration algorithms envisage that weather degradations occur independently, thus may fail to handle real-world complicated scenarios. Besides, supervised training is not feasible due to the lack of a comprehensive paired dataset to characterize hybrid conditions. To this end, we have advanced the aforementioned limitations with two tactics: framework and data. First, we present a novel unified framework, dubbed RAHC, to Restore Arbitrary Hybrid adverse weather Conditions in one go. Specifically, our RAHC leverages a multi-head aggregation architecture to learn multiple degradation representation subspaces and then constrains the network to flexibly handle multiple hybrid adverse weather in a unified paradigm through a discrimination mechanism in the output space. Furthermore, we devise a reconstruction vectors aided scheme to provide auxiliary visual content cues for reconstruction, thus can comfortably cope with hybrid scenarios with insufficient remaining image constituents. Second, we construct a new dataset, termed HAC, for learning and benchmarking arbitrary Hybrid Adverse Conditions restoration. HAC contains 31 scenarios composed of an arbitrary combination of five common weather, with a total of ~316K adverse-weather/clean pairs. Extensive experiments yield superior results and establish new state-of-the-art results on both HAC and conventional datasets.Comment: In submissio

    OSIC: A New One-Stage Image Captioner Coined

    Full text link
    Mainstream image caption models are usually two-stage captioners, i.e., calculating object features by pre-trained detector, and feeding them into a language model to generate text descriptions. However, such an operation will cause a task-based information gap to decrease the performance, since the object features in detection task are suboptimal representation and cannot provide all necessary information for subsequent text generation. Besides, object features are usually represented by the last layer features that lose the local details of input images. In this paper, we propose a novel One-Stage Image Captioner (OSIC) with dynamic multi-sight learning, which directly transforms input image into descriptive sentences in one stage. As a result, the task-based information gap can be greatly reduced. To obtain rich features, we use the Swin Transformer to calculate multi-level features, and then feed them into a novel dynamic multi-sight embedding module to exploit both global structure and local texture of input images. To enhance the global modeling of encoder for caption, we propose a new dual-dimensional refining module to non-locally model the interaction of the embedded features. Finally, OSIC can obtain rich and useful information to improve the image caption task. Extensive comparisons on benchmark MS-COCO dataset verified the superior performance of our method

    A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations

    Full text link
    Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources. To enable the deployment of modern models on resource-constrained environments and accelerate inference time, researchers have increasingly explored pruning techniques as a popular research direction in neural network compression. However, there is a dearth of up-to-date comprehensive review papers on pruning. To address this issue, in this survey, we provide a comprehensive review of existing research works on deep neural network pruning in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to prune, and 4) fusion of pruning and other compression techniques. We then provide a thorough comparative analysis of seven pairs of contrast settings for pruning (e.g., unstructured/structured) and explore emerging topics, including post-training pruning, different levels of supervision for pruning, and broader applications (e.g., adversarial robustness) to shed light on the commonalities and differences of existing methods and lay the foundation for further method development. To facilitate future research, we build a curated collection of datasets, networks, and evaluations on different applications. Finally, we provide some valuable recommendations on selecting pruning methods and prospect promising research directions. We build a repository at https://github.com/hrcheng1066/awesome-pruning

    Salient Object Detection Techniques in Computer Vision-A Survey.

    Full text link
    Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end
    corecore