10,018 research outputs found
DPW-SDNet: Dual Pixel-Wavelet Domain Deep CNNs for Soft Decoding of JPEG-Compressed Images
JPEG is one of the widely used lossy compression methods. JPEG-compressed
images usually suffer from compression artifacts including blocking and
blurring, especially at low bit-rates. Soft decoding is an effective solution
to improve the quality of compressed images without changing codec or
introducing extra coding bits. Inspired by the excellent performance of the
deep convolutional neural networks (CNNs) on both low-level and high-level
computer vision problems, we develop a dual pixel-wavelet domain deep
CNNs-based soft decoding network for JPEG-compressed images, namely DPW-SDNet.
The pixel domain deep network takes the four downsampled versions of the
compressed image to form a 4-channel input and outputs a pixel domain
prediction, while the wavelet domain deep network uses the 1-level discrete
wavelet transformation (DWT) coefficients to form a 4-channel input to produce
a DWT domain prediction. The pixel domain and wavelet domain estimates are
combined to generate the final soft decoded result. Experimental results
demonstrate the superiority of the proposed DPW-SDNet over several
state-of-the-art compression artifacts reduction algorithms.Comment: CVPRW 201
A Review of Feature and Data Fusion with Medical Images
The fusion techniques that utilize multiple feature sets to form new features
that are often more robust and contain useful information for future processing
are referred to as feature fusion. The term data fusion is applied to the class
of techniques used for combining decisions obtained from multiple feature sets
to form global decisions. Feature and data fusion interchangeably represent two
important classes of techniques that have proved to be of practical importance
in a wide range of medical imaging problemsComment: Multisensor Data Fusion: From Algorithm and Architecture Design to
Applications, CRC Press, 2015. arXiv admin note: substantial text overlap
with arXiv:1401.016
Microarrays denoising via smoothing of coefficients in wavelet domain
We describe a novel method for removing noise (in wavelet domain) of unknown
variance from microarrays. The method is based on a smoothing of the
coefficients of the highest subbands. Specifically, we decompose the noisy
microarray into wavelet subbands, apply smoothing within each highest subband,
and reconstruct a microarray from the modified wavelet coefficients. This
process is applied a single time, and exclusively to the first level of
decomposition, i.e., in most of the cases, it is not necessary a
multirresoltuion analysis. Denoising results compare favorably to the most of
methods in use at the moment.Comment: 8 pages, 4 figures, 2 tables. arXiv admin note: text overlap with
arXiv:1611.02302, arXiv:1607.03105; text overlap with arXiv:1212.0291 by
other author
Hierarchical method for cataract grading based on retinal images using improved Haar wavelet
Cataracts, which are lenticular opacities that may occur at different lens
locations, are the leading cause of visual impairment worldwide. Accurate and
timely diagnosis can improve the quality of life of cataract patients. In this
paper, a feature extraction-based method for grading cataract severity using
retinal images is proposed. To obtain more appropriate features for the
automatic grading, the Haar wavelet is improved according to the
characteristics of retinal images. Retinal images of non-cataract, as well as
mild, moderate, and severe cataracts, are automatically recognized using the
improved Haar wavelet. A hierarchical strategy is used to transform the
four-class classification problem into three adjacent two-class classification
problems. Three sets of two-class classifiers based on a neural network are
trained individually and integrated together to establish a complete
classification system. The accuracies of the two-class classification (cataract
and non-cataract) and four-class classification are 94.83% and 85.98%,
respectively. The performance analysis demonstrates that the improved Haar
wavelet feature achieves higher accuracy than the original Haar wavelet
feature, and the fusion of three sets of two-class classifiers is superior to a
simple four-class classifier. The discussion indicates that the retinal
image-based method offers significant potential for cataract detection.Comment: Under Review by Information Fusion (Elsevier
Using Wavelets to Analyze Similarities in Image-Classification Datasets
Deep learning image classifiers usually rely on huge training sets and their
training process can be described as learning the similarities and differences
among training images. But, images in large training sets are not usually
studied from this perspective and fine-level similarities and differences among
images is usually overlooked. This is due to lack of fast and efficient
computational methods to analyze the contents of these datasets. Some studies
aim to identify the influential and redundant training images, but such methods
require a model that is already trained on the entire training set. Here, using
image processing and numerical analysis tools we develop a practical and fast
method to analyze the similarities in image classification datasets. We show
that such analysis can provide valuable insights about the datasets and the
classification task at hand, prior to training a model. Our method uses wavelet
decomposition of images and other numerical analysis tools, with no need for a
pre-trained model. Interestingly, the results we obtain corroborate the
previous results in the literature that analyzed the similarities using
pre-trained CNNs. We show that similar images in standard datasets (such as
CIFAR) can be identified in a few seconds, a significant speed-up compared to
alternative methods in the literature. By removing the computational speed
obstacle, it becomes practical to gain new insights about the contents of
datasets and the models trained on them. We show that similarities between
training and testing images may provide insights about the generalization of
models. Finally, we investigate the similarities between images in relation to
decision boundaries of a trained model
An ANN-based Method for Detecting Vocal Fold Pathology
There are different algorithms for vocal fold pathology diagnosis. These
algorithms usually have three stages which are Feature Extraction, Feature
Reduction and Classification. While the third stage implies a choice of a
variety of machine learning methods, the first and second stages play a
critical role in performance and accuracy of the classification system. In this
paper we present initial study of feature extraction and feature reduction in
the task of vocal fold pathology diagnosis. A new type of feature vector, based
on wavelet packet decomposition and Mel-Frequency-Cepstral-Coefficients
(MFCCs), is proposed. Also Principal Component Analysis (PCA) is used for
feature reduction. An Artificial Neural Network is used as a classifier for
evaluating the performance of our proposed method.Comment: 4 pages, 3 figures, Published with International Journal of Computer
Applications (IJCA
dipIQ: Blind Image Quality Assessment by Learning-to-Rank Discriminable Image Pairs
Objective assessment of image quality is fundamentally important in many
image processing tasks. In this work, we focus on learning blind image quality
assessment (BIQA) models which predict the quality of a digital image with no
access to its original pristine-quality counterpart as reference. One of the
biggest challenges in learning BIQA models is the conflict between the gigantic
image space (which is in the dimension of the number of image pixels) and the
extremely limited reliable ground truth data for training. Such data are
typically collected via subjective testing, which is cumbersome, slow, and
expensive. Here we first show that a vast amount of reliable training data in
the form of quality-discriminable image pairs (DIP) can be obtained
automatically at low cost by exploiting large-scale databases with diverse
image content. We then learn an opinion-unaware BIQA (OU-BIQA, meaning that no
subjective opinions are used for training) model using RankNet, a pairwise
learning-to-rank (L2R) algorithm, from millions of DIPs, each associated with a
perceptual uncertainty level, leading to a DIP inferred quality (dipIQ) index.
Extensive experiments on four benchmark IQA databases demonstrate that dipIQ
outperforms state-of-the-art OU-BIQA models. The robustness of dipIQ is also
significantly improved as confirmed by the group MAximum Differentiation (gMAD)
competition method. Furthermore, we extend the proposed framework by learning
models with ListNet (a listwise L2R algorithm) on quality-discriminable image
lists (DIL). The resulting DIL Inferred Quality (dilIQ) index achieves an
additional performance gain
Image Quality Assessment: Unifying Structure and Texture Similarity
Objective measures of image quality generally operate by comparing pixels of
a "degraded" image to those of the original. Relative to human observers, these
measures are overly sensitive to resampling of texture regions (e.g., replacing
one patch of grass with another). Here, we develop the first full-reference
image quality model with explicit tolerance to texture resampling. Using a
convolutional neural network, we construct an injective and differentiable
function that transforms images to multi-scale overcomplete representations. We
demonstrate empirically that the spatial averages of the feature maps in this
representation capture texture appearance, in that they provide a set of
sufficient statistical constraints to synthesize a wide variety of texture
patterns. We then describe an image quality method that combines correlations
of these spatial averages ("texture similarity") with correlations of the
feature maps ("structure similarity"). The parameters of the proposed measure
are jointly optimized to match human ratings of image quality, while minimizing
the reported distances between subimages cropped from the same texture images.
Experiments show that the optimized method explains human perceptual scores,
both on conventional image quality databases, as well as on texture databases.
The measure also offers competitive performance on related tasks such as
texture classification and retrieval. Finally, we show that our method is
relatively insensitive to geometric transformations (e.g., translation and
dilation), without use of any specialized training or data augmentation. Code
is available at https://github.com/dingkeyan93/DISTS
Adaptive Transform Domain Image Super-resolution Via Orthogonally Regularized Deep Networks
Deep learning methods, in particular, trained Convolutional Neural Networks
(CNN) have recently been shown to produce compelling results for single image
Super-Resolution (SR). Invariably, a CNN is learned to map the Low Resolution
(LR) image to its corresponding High Resolution (HR) version in the spatial
domain. We propose a novel network structure for learning the SR mapping
function in an image transform domain, specifically the Discrete Cosine
Transform (DCT). As the first contribution, we show that DCT can be integrated
into the network structure as a Convolutional DCT (CDCT) layer. With the CDCT
layer, we construct the DCT Deep SR (DCT-DSR) network. We further extend the
DCT-DSR to allow the CDCT layer to become trainable (i.e., optimizable).
Because this layer represents an image transform, we enforce pairwise
orthogonality constraints and newly formulated complexity order constraints on
the individual basis functions/filters. This Orthogonally Regularized Deep SR
network (ORDSR) simplifies the SR task by taking advantage of image transform
domain while adapting the design of transform basis to the training image set.
Experimental results show ORDSR achieves state-of-the-art SR image quality with
fewer parameters than most of the deep CNN methods. A particular success of
ORDSR is in overcoming the artifacts introduced by bicubic interpolation. A key
burden of deep SR has been identified as the requirement of generous training
LR and HR image pairs; ORSDR exhibits a much more graceful degradation as
training size is reduced with significant benefits in the regime of limited
training. Analysis of memory and computation requirements confirms that ORDSR
can allow for a more efficient network with faster inference
TRLF: An Effective Semi-fragile Watermarking Method for Tamper Detection and Recovery based on LWT and FNN
This paper proposes a novel method for tamper detection and recovery using
semi-fragile data hiding, based on Lifting Wavelet Transform (LWT) and
Feed-Forward Neural Network (FNN). In TRLF, first, the host image is decomposed
up to one level using LWT, and the Discrete Cosine Transform (DCT) is applied
to each 2*2 blocks of diagonal details. Next, a random binary sequence is
embedded in each block as the watermark by correlating coefficients. In
authentication stage, first, the watermarked image geometry is reconstructed by
using Speeded Up Robust Features (SURF) algorithm and extract watermark bits by
using FNN. Afterward, logical exclusive-or operation between original and
extracted watermark is applied to detect tampered region. Eventually, in the
recovery stage, tampered regions are recovered by image digest which is
generated by inverse halftoning technique. The performance and efficiency of
TRLF and its robustness against various geometric, non-geometric and hybrid
attacks are reported. From the experimental results, it can be seen that TRLF
is superior in terms of robustness and quality of the digest and watermarked
image respectively, compared to the-state-of-the-art fragile and semi-fragile
watermarking methods. In addition, imperceptibility has been improved by using
different correlation steps as the gain factor for flat (smooth) and texture
(rough) blocks
- …