9,317 research outputs found
Scene Text Image Super-Resolution in the Wild
Low-resolution text images are often seen in natural scenes such as documents
captured by mobile phones. Recognizing low-resolution text images is
challenging because they lose detailed content information, leading to poor
recognition accuracy. An intuitive solution is to introduce super-resolution
(SR) techniques as pre-processing. However, previous single image
super-resolution (SISR) methods are trained on synthetic low-resolution images
(e.g.Bicubic down-sampling), which is simple and not suitable for real
low-resolution text recognition. To this end, we pro-pose a real scene text SR
dataset, termed TextZoom. It contains paired real low-resolution and
high-resolution images which are captured by cameras with different focal
length in the wild. It is more authentic and challenging than synthetic data,
as shown in Fig. 1. We argue improv-ing the recognition accuracy is the
ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution
Network termed TSRN, with three novel modules is developed. (1) A sequential
residual block is proposed to extract the sequential information of the text
images. (2) A boundary-aware loss is designed to sharpen the character
boundaries. (3) A central alignment module is proposed to relieve the
misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate
that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and
by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore,
our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the
recognition accuracy of LR images in TextZoom. For example, it outperforms
LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our
results suggest that low-resolution text recognition in the wild is far from
being solved, thus more research effort is needed.Comment: Accepted by ECCV202
Retinal Microaneurysms Detection using Local Convergence Index Features
Retinal microaneurysms are the earliest clinical sign of diabetic retinopathy
disease. Detection of microaneurysms is crucial for the early diagnosis of
diabetic retinopathy and prevention of blindness. In this paper, a novel and
reliable method for automatic detection of microaneurysms in retinal images is
proposed. In the first stage of the proposed method, several preliminary
microaneurysm candidates are extracted using a gradient weighting technique and
an iterative thresholding approach. In the next stage, in addition to intensity
and shape descriptors, a new set of features based on local convergence index
filters is extracted for each candidate. Finally, the collective set of
features is fed to a hybrid sampling/boosting classifier to discriminate the
MAs from non-MAs candidates. The method is evaluated on images with different
resolutions and modalities (RGB and SLO) using five publicly available datasets
including the Retinopathy Online Challenge's dataset. The proposed method
achieves an average sensitivity score of 0.471 on the ROC dataset outperforming
state-of-the-art approaches in an extensive comparison. The experimental
results on the other four datasets demonstrate the effectiveness and robustness
of the proposed microaneurysms detection method regardless of different image
resolutions and modalities
Joint Maximum Purity Forest with Application to Image Super-Resolution
In this paper, we propose a novel random-forest scheme, namely Joint Maximum
Purity Forest (JMPF), for classification, clustering, and regression tasks. In
the JMPF scheme, the original feature space is transformed into a compactly
pre-clustered feature space, via a trained rotation matrix. The rotation matrix
is obtained through an iterative quantization process, where the input data
belonging to different classes are clustered to the respective vertices of the
new feature space with maximum purity. In the new feature space, orthogonal
hyperplanes, which are employed at the split-nodes of decision trees in random
forests, can tackle the clustering problems effectively. We evaluated our
proposed method on public benchmark datasets for regression and classification
tasks, and experiments showed that JMPF remarkably outperforms other
state-of-the-art random-forest-based approaches. Furthermore, we applied JMPF
to image super-resolution, because the transformed, compact features are more
discriminative to the clustering-regression scheme. Experiment results on
several public benchmark datasets also showed that the JMPF-based image
super-resolution scheme is consistently superior to recent state-of-the-art
image super-resolution algorithms.Comment: 18 pages, 7 figure
Missing Data Reconstruction in Remote Sensing image with a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network
Because of the internal malfunction of satellite sensors and poor atmospheric
conditions such as thick cloud, the acquired remote sensing data often suffer
from missing information, i.e., the data usability is greatly reduced. In this
paper, a novel method of missing information reconstruction in remote sensing
images is proposed. The unified spatial-temporal-spectral framework based on a
deep convolutional neural network (STS-CNN) employs a unified deep
convolutional neural network combined with spatial-temporal-spectral
supplementary information. In addition, to address the fact that most methods
can only deal with a single missing information reconstruction task, the
proposed approach can solve three typical missing information reconstruction
tasks: 1) dead lines in Aqua MODIS band 6; 2) the Landsat ETM+ Scan Line
Corrector (SLC)-off problem; and 3) thick cloud removal. It should be noted
that the proposed model can use multi-source data (spatial, spectral, and
temporal) as the input of the unified framework. The results of both simulated
and real-data experiments demonstrate that the proposed model exhibits high
effectiveness in the three missing information reconstruction tasks listed
above.Comment: To be published in IEEE Transactions on Geoscience and Remote Sensin
Deep Laplacian Pyramid Network for Text Images Super-Resolution
Convolutional neural networks have recently demonstrated interesting results
for single image super-resolution. However, these networks were trained to deal
with super-resolution problem on natural images. In this paper, we adapt a deep
network, which was proposed for natural images superresolution, to single text
image super-resolution. To evaluate the network, we present our database for
single text image super-resolution. Moreover, we propose to combine Gradient
Difference Loss (GDL) with L1/L2 loss to enhance edges in super-resolution
image. Quantitative and qualitative evaluations on our dataset show that adding
the GDL improves the super-resolution results.Comment: paper, 6 page
Chaining Identity Mapping Modules for Image Denoising
We propose to learn a fully-convolutional network model that consists of a
Chain of Identity Mapping Modules (CIMM) for image denoising. The CIMM
structure possesses two distinctive features that are important for the noise
removal task. Firstly, each residual unit employs identity mappings as the skip
connections and receives pre-activated input in order to preserve the gradient
magnitude propagated in both the forward and backward directions. Secondly, by
utilizing dilated kernels for the convolution layers in the residual branch, in
other words within an identity mapping module, each neuron in the last
convolution layer can observe the full receptive field of the first layer.
After being trained on the BSD400 dataset, the proposed network produces
remarkably higher numerical accuracy and better visual image quality than the
state-of-the-art when being evaluated on conventional benchmark images and the
BSD68 dataset
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling
We propose a novel deep architecture, SegNet, for semantic pixel wise image
labelling. SegNet has several attractive properties; (i) it only requires
forward evaluation of a fully learnt function to obtain smooth label
predictions, (ii) with increasing depth, a larger context is considered for
pixel labelling which improves accuracy, and (iii) it is easy to visualise the
effect of feature activation(s) in the pixel label space at any depth. SegNet
is composed of a stack of encoders followed by a corresponding decoder stack
which feeds into a soft-max classification layer. The decoders help map low
resolution feature maps at the output of the encoder stack to full input image
size feature maps. This addresses an important drawback of recent deep learning
approaches which have adopted networks designed for object categorization for
pixel wise labelling. These methods lack a mechanism to map deep layer feature
maps to input dimensions. They resort to ad hoc methods to upsample features,
e.g. by replication. This results in noisy predictions and also restricts the
number of pooling layers in order to avoid too much upsampling and thus reduces
spatial context. SegNet overcomes these problems by learning to map encoder
outputs to image pixel labels. We test the performance of SegNet on outdoor RGB
scenes from CamVid, KITTI and indoor scenes from the NYU dataset. Our results
show that SegNet achieves state-of-the-art performance even without use of
additional cues such as depth, video frames or post-processing with CRF models.Comment: This version was first submitted to CVPR' 15 on November 14, 2014
with paper Id 1468. A similar architecture was proposed more recently on May
17, 2015, see http://arxiv.org/pdf/1505.04366.pd
Deep Convolutional Framelet Denosing for Low-Dose CT via Wavelet Residual Network
Model based iterative reconstruction (MBIR) algorithms for low-dose X-ray CT
are computationally expensive. To address this problem, we recently proposed a
deep convolutional neural network (CNN) for low-dose X-ray CT and won the
second place in 2016 AAPM Low-Dose CT Grand Challenge. However, some of the
texture were not fully recovered. To address this problem, here we propose a
novel framelet-based denoising algorithm using wavelet residual network which
synergistically combines the expressive power of deep learning and the
performance guarantee from the framelet-based denoising algorithms. The new
algorithms were inspired by the recent interpretation of the deep convolutional
neural network (CNN) as a cascaded convolution framelet signal representation.
Extensive experimental results confirm that the proposed networks have
significantly improved performance and preserves the detail texture of the
original images.Comment: This will appear in IEEE Transaction on Medical Imaging, a special
issue of Machine Learning for Image Reconstructio
Hybrid Channel Based Pedestrian Detection
Pedestrian detection has achieved great improvements with the help of
Convolutional Neural Networks (CNNs). CNN can learn high-level features from
input images, but the insufficient spatial resolution of CNN feature channels
(feature maps) may cause a loss of information, which is harmful especially to
small instances. In this paper, we propose a new pedestrian detection
framework, which extends the successful RPN+BF framework to combine handcrafted
features and CNN features. RoI-pooling is used to extract features from both
handcrafted channels (e.g. HOG+LUV, CheckerBoards or RotatedFilters) and CNN
channels. Since handcrafted channels always have higher spatial resolution than
CNN channels, we apply RoI-pooling with larger output resolution to handcrafted
channels to keep more detailed information. Our ablation experiments show that
the developed handcrafted features can reach better detection accuracy than the
CNN features extracted from the VGG-16 net, and a performance gain can be
achieved by combining them. Experimental results on Caltech pedestrian dataset
with the original annotations and the improved annotations demonstrate the
effectiveness of the proposed approach. When using a more advanced RPN in our
framework, our approach can be further improved and get competitive results on
both benchmarks.Comment: 9 pages, 4 figures, Submitted to Neurocomputing, The 5th line of
table 3 was accidentally mistaken. The data have been corrected and the
related descriptions in section 4.4 have also be revised accordingly. Typos
corrected, references correcte
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to
Computer Vision and Image Processing. Some of the exciting application areas of
CNN include Image Classification and Segmentation, Object Detection, Video
Processing, Natural Language Processing, and Speech Recognition. The powerful
learning ability of deep CNN is primarily due to the use of multiple feature
extraction stages that can automatically learn representations from the data.
The availability of a large amount of data and improvement in the hardware
technology has accelerated the research in CNNs, and recently interesting deep
CNN architectures have been reported. Several inspiring ideas to bring
advancements in CNNs have been explored, such as the use of different
activation and loss functions, parameter optimization, regularization, and
architectural innovations. However, the significant improvement in the
representational capacity of the deep CNN is achieved through architectural
innovations. Notably, the ideas of exploiting spatial and channel information,
depth and width of architecture, and multi-path information processing have
gained substantial attention. Similarly, the idea of using a block of layers as
a structural unit is also gaining popularity. This survey thus focuses on the
intrinsic taxonomy present in the recently reported deep CNN architectures and,
consequently, classifies the recent innovations in CNN architectures into seven
different categories. These seven categories are based on spatial exploitation,
depth, multi-path, width, feature-map exploitation, channel boosting, and
attention. Additionally, the elementary understanding of CNN components,
current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11.
Artif Intell Rev (2020
- …