Search CORE

9,317 research outputs found

Scene Text Image Super-Resolution in the Wild

Author: Bai Xiang
Liang Ding
Liu Xuebo
Shen Chunhua
Wang Wenhai
Wang Wenjia
Xie Enze
Publication venue
Publication date: 01/08/2020
Field of study

Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Recognizing low-resolution text images is challenging because they lose detailed content information, leading to poor recognition accuracy. An intuitive solution is to introduce super-resolution (SR) techniques as pre-processing. However, previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images (e.g.Bicubic down-sampling), which is simple and not suitable for real low-resolution text recognition. To this end, we pro-pose a real scene text SR dataset, termed TextZoom. It contains paired real low-resolution and high-resolution images which are captured by cameras with different focal length in the wild. It is more authentic and challenging than synthetic data, as shown in Fig. 1. We argue improv-ing the recognition accuracy is the ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution Network termed TSRN, with three novel modules is developed. (1) A sequential residual block is proposed to extract the sequential information of the text images. (2) A boundary-aware loss is designed to sharpen the character boundaries. (3) A central alignment module is proposed to relieve the misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore, our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the recognition accuracy of LR images in TextZoom. For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our results suggest that low-resolution text recognition in the wild is far from being solved, thus more research effort is needed.Comment: Accepted by ECCV202

arXiv.org e-Print Archive

Retinal Microaneurysms Detection using Local Convergence Index Features

Author: Dashtbozorg Behdad
Romeny Bart M. ter Haar
Zhang Jiong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/07/2017
Field of study

Retinal microaneurysms are the earliest clinical sign of diabetic retinopathy disease. Detection of microaneurysms is crucial for the early diagnosis of diabetic retinopathy and prevention of blindness. In this paper, a novel and reliable method for automatic detection of microaneurysms in retinal images is proposed. In the first stage of the proposed method, several preliminary microaneurysm candidates are extracted using a gradient weighting technique and an iterative thresholding approach. In the next stage, in addition to intensity and shape descriptors, a new set of features based on local convergence index filters is extracted for each candidate. Finally, the collective set of features is fed to a hybrid sampling/boosting classifier to discriminate the MAs from non-MAs candidates. The method is evaluated on images with different resolutions and modalities (RGB and SLO) using five publicly available datasets including the Retinopathy Online Challenge's dataset. The proposed method achieves an average sensitivity score of 0.471 on the ROC dataset outperforming state-of-the-art approaches in an extensive comparison. The experimental results on the other four datasets demonstrate the effectiveness and robustness of the proposed microaneurysms detection method regardless of different image resolutions and modalities

arXiv.org e-Print Archive

Joint Maximum Purity Forest with Application to Image Super-Resolution

Author: Lam Kin-Man
Li Dong
Li Hailiang
Publication venue
Publication date: 30/08/2017
Field of study

In this paper, we propose a novel random-forest scheme, namely Joint Maximum Purity Forest (JMPF), for classification, clustering, and regression tasks. In the JMPF scheme, the original feature space is transformed into a compactly pre-clustered feature space, via a trained rotation matrix. The rotation matrix is obtained through an iterative quantization process, where the input data belonging to different classes are clustered to the respective vertices of the new feature space with maximum purity. In the new feature space, orthogonal hyperplanes, which are employed at the split-nodes of decision trees in random forests, can tackle the clustering problems effectively. We evaluated our proposed method on public benchmark datasets for regression and classification tasks, and experiments showed that JMPF remarkably outperforms other state-of-the-art random-forest-based approaches. Furthermore, we applied JMPF to image super-resolution, because the transformed, compact features are more discriminative to the clustering-regression scheme. Experiment results on several public benchmark datasets also showed that the JMPF-based image super-resolution scheme is consistently superior to recent state-of-the-art image super-resolution algorithms.Comment: 18 pages, 7 figure

arXiv.org e-Print Archive

Missing Data Reconstruction in Remote Sensing image with a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network

Author: Li Xinghua
Wei Yancong
Yuan Qiangqiang
Zeng Chao
Zhang Qiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/02/2018
Field of study

Because of the internal malfunction of satellite sensors and poor atmospheric conditions such as thick cloud, the acquired remote sensing data often suffer from missing information, i.e., the data usability is greatly reduced. In this paper, a novel method of missing information reconstruction in remote sensing images is proposed. The unified spatial-temporal-spectral framework based on a deep convolutional neural network (STS-CNN) employs a unified deep convolutional neural network combined with spatial-temporal-spectral supplementary information. In addition, to address the fact that most methods can only deal with a single missing information reconstruction task, the proposed approach can solve three typical missing information reconstruction tasks: 1) dead lines in Aqua MODIS band 6; 2) the Landsat ETM+ Scan Line Corrector (SLC)-off problem; and 3) thick cloud removal. It should be noted that the proposed model can use multi-source data (spatial, spectral, and temporal) as the input of the unified framework. The results of both simulated and real-data experiments demonstrate that the proposed model exhibits high effectiveness in the three missing information reconstruction tasks listed above.Comment: To be published in IEEE Transactions on Geoscience and Remote Sensin

arXiv.org e-Print Archive

Deep Laplacian Pyramid Network for Text Images Super-Resolution

Author: Ho-Phuoc Tien
Tran Hanh T. M.
Publication venue
Publication date: 26/11/2018
Field of study

Convolutional neural networks have recently demonstrated interesting results for single image super-resolution. However, these networks were trained to deal with super-resolution problem on natural images. In this paper, we adapt a deep network, which was proposed for natural images superresolution, to single text image super-resolution. To evaluate the network, we present our database for single text image super-resolution. Moreover, we propose to combine Gradient Difference Loss (GDL) with L1/L2 loss to enhance edges in super-resolution image. Quantitative and qualitative evaluations on our dataset show that adding the GDL improves the super-resolution results.Comment: paper, 6 page

arXiv.org e-Print Archive

Chaining Identity Mapping Modules for Image Denoising

Author: Anwar Saeed
Huynh Cong Phouc
Porikli Fatih
Publication venue
Publication date: 18/07/2019
Field of study

We propose to learn a fully-convolutional network model that consists of a Chain of Identity Mapping Modules (CIMM) for image denoising. The CIMM structure possesses two distinctive features that are important for the noise removal task. Firstly, each residual unit employs identity mappings as the skip connections and receives pre-activated input in order to preserve the gradient magnitude propagated in both the forward and backward directions. Secondly, by utilizing dilated kernels for the convolution layers in the residual branch, in other words within an identity mapping module, each neuron in the last convolution layer can observe the full receptive field of the first layer. After being trained on the BSD400 dataset, the proposed network produces remarkably higher numerical accuracy and better visual image quality than the state-of-the-art when being evaluated on conventional benchmark images and the BSD68 dataset

arXiv.org e-Print Archive

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

Author: Badrinarayanan Vijay
Cipolla Roberto
Handa Ankur
Publication venue
Publication date: 27/05/2015
Field of study

We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy to visualise the effect of feature activation(s) in the pixel label space at any depth. SegNet is composed of a stack of encoders followed by a corresponding decoder stack which feeds into a soft-max classification layer. The decoders help map low resolution feature maps at the output of the encoder stack to full input image size feature maps. This addresses an important drawback of recent deep learning approaches which have adopted networks designed for object categorization for pixel wise labelling. These methods lack a mechanism to map deep layer feature maps to input dimensions. They resort to ad hoc methods to upsample features, e.g. by replication. This results in noisy predictions and also restricts the number of pooling layers in order to avoid too much upsampling and thus reduces spatial context. SegNet overcomes these problems by learning to map encoder outputs to image pixel labels. We test the performance of SegNet on outdoor RGB scenes from CamVid, KITTI and indoor scenes from the NYU dataset. Our results show that SegNet achieves state-of-the-art performance even without use of additional cues such as depth, video frames or post-processing with CRF models.Comment: This version was first submitted to CVPR' 15 on November 14, 2014 with paper Id 1468. A similar architecture was proposed more recently on May 17, 2015, see http://arxiv.org/pdf/1505.04366.pd

arXiv.org e-Print Archive

Deep Convolutional Framelet Denosing for Low-Dose CT via Wavelet Residual Network

Author: Kang Eunhee
Ye Jong Chul
Yoo Jaejun
Publication venue
Publication date: 28/03/2018
Field of study

Model based iterative reconstruction (MBIR) algorithms for low-dose X-ray CT are computationally expensive. To address this problem, we recently proposed a deep convolutional neural network (CNN) for low-dose X-ray CT and won the second place in 2016 AAPM Low-Dose CT Grand Challenge. However, some of the texture were not fully recovered. To address this problem, here we propose a novel framelet-based denoising algorithm using wavelet residual network which synergistically combines the expressive power of deep learning and the performance guarantee from the framelet-based denoising algorithms. The new algorithms were inspired by the recent interpretation of the deep convolutional neural network (CNN) as a cascaded convolution framelet signal representation. Extensive experimental results confirm that the proposed networks have significantly improved performance and preserves the detail texture of the original images.Comment: This will appear in IEEE Transaction on Medical Imaging, a special issue of Machine Learning for Image Reconstructio

arXiv.org e-Print Archive

Hybrid Channel Based Pedestrian Detection

Author: Chen Mingjian
Huang Kaizhu
Lin Junpeng
Tesema Fiseha B.
Wu Hong
Zhu William
Publication venue: 'Elsevier BV'
Publication date: 29/01/2020
Field of study

Pedestrian detection has achieved great improvements with the help of Convolutional Neural Networks (CNNs). CNN can learn high-level features from input images, but the insufficient spatial resolution of CNN feature channels (feature maps) may cause a loss of information, which is harmful especially to small instances. In this paper, we propose a new pedestrian detection framework, which extends the successful RPN+BF framework to combine handcrafted features and CNN features. RoI-pooling is used to extract features from both handcrafted channels (e.g. HOG+LUV, CheckerBoards or RotatedFilters) and CNN channels. Since handcrafted channels always have higher spatial resolution than CNN channels, we apply RoI-pooling with larger output resolution to handcrafted channels to keep more detailed information. Our ablation experiments show that the developed handcrafted features can reach better detection accuracy than the CNN features extracted from the VGG-16 net, and a performance gain can be achieved by combining them. Experimental results on Caltech pedestrian dataset with the original annotations and the improved annotations demonstrate the effectiveness of the proposed approach. When using a more advanced RPN in our framework, our approach can be further improved and get competitive results on both benchmarks.Comment: 9 pages, 4 figures, Submitted to Neurocomputing, The 5th line of table 3 was accidentally mistaken. The data have been corrected and the related descriptions in section 4.4 have also be revised accordingly. Typos corrected, references correcte

arXiv.org e-Print Archive

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Author: Khan Asifullah
Qureshi Aqsa Saeed
Sohail Anabia
Zahoora Umme
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/05/2020
Field of study

Deep Convolutional Neural Network (CNN) is a special type of Neural Networks, which has shown exemplary performance on several competitions related to Computer Vision and Image Processing. Some of the exciting application areas of CNN include Image Classification and Segmentation, Object Detection, Video Processing, Natural Language Processing, and Speech Recognition. The powerful learning ability of deep CNN is primarily due to the use of multiple feature extraction stages that can automatically learn representations from the data. The availability of a large amount of data and improvement in the hardware technology has accelerated the research in CNNs, and recently interesting deep CNN architectures have been reported. Several inspiring ideas to bring advancements in CNNs have been explored, such as the use of different activation and loss functions, parameter optimization, regularization, and architectural innovations. However, the significant improvement in the representational capacity of the deep CNN is achieved through architectural innovations. Notably, the ideas of exploiting spatial and channel information, depth and width of architecture, and multi-path information processing have gained substantial attention. Similarly, the idea of using a block of layers as a structural unit is also gaining popularity. This survey thus focuses on the intrinsic taxonomy present in the recently reported deep CNN architectures and, consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature-map exploitation, channel boosting, and attention. Additionally, the elementary understanding of CNN components, current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11. Artif Intell Rev (2020

arXiv.org e-Print Archive