5 research outputs found
Texture Classification Using Pair-wise Difference Pooling Based Bilinear Convolutional Neural Networks
Texture is normally represented by aggregating local features based on the assumption of spatial homogeneity. Effective texture features are always the research focus even though both hand-crafted and deep learning approaches have been extensively investigated. Motivated by the success of Bilinear Convolutional Neural Networks (BCNNs) in fine-grained image recognition, we propose to incorporate the BCNN with the Pair-wise Difference Pooling (i.e. BCNN-PDP) for texture classification. The BCNN-PDP is built on top of a set of feature maps extracted at a convolutional layer of the pre-trained CNN. Compared with the outer product used by the original BCNN feature set, the pair-wise difference not only captures the pair-wise relationship between two sets of features but also encodes the difference between each pair of features. Considering the importance of the gradient data to the representation of image structures, we further generalise the BCNN-PDP feature set to two sets of feature maps computed from the original image and its gradient magnitude map respectively, i.e. the Fused BCNN-PDP (F-BCNN-PDP) feature set. In addition, the BCNN-PDP can be applied to two different CNNs and is referred to as the Asymmetric BCNN-PDP (A-BCNN-PDP). The three PDP-based BCNN feature sets can also be extracted at multiple scales. Since the dimensionality of the BCNN feature vectors is very high, we propose a new yet simple Block-wise PCA (BPCA) method in order to derive more compact feature vectors. The proposed methods are tested on seven different datasets along with 21 baseline feature sets. The results show that the proposed feature sets are superior, or at least comparable, to their counterparts across different datasets.<br
Monocular Visual-IMU Odometry: A Comparative Evaluation of Detector-Descriptor-Based Methods
Monocular visual-IMU (Inertial Measurement Unit) odometry has been widely used in various intelligent vehicles. As a popular technique, detector-descriptor based visual-IMU odometry is effective and efficient due to the fact that local descriptors are robust against occlusions, background clutter and abrupt content changes. However, to our knowledge, there is not a comprehensive and comparative evaluation study on the performance of different combinations of detectors and descriptors recently developed. In order to bridge this gap, we conduct such a comparative study in a unified framework. In particular, six typical routes with different lengths, shapes and road scenes are selected from the well-known KITTI dataset. Firstly, we evaluate the performance of different combinations of salient point detectors and local descriptors using the six routes. Finally, we tune the parameters of the best detector or descriptor obtained for each route, to achieve better results. This study provides not only comprehensive benchmarks for assessing various algorithms, but also instructive guidelines and insights for developing detectors and descriptors to handle different road scenes.</p
Deep Color-Corrected Multi-scale Retinex Network for Underwater Image Enhancement
The acquisition of high-quality underwater images is of great importance to ocean exploration activities. However, images captured in the underwater environment often suffer from degradation due to complex imaging conditions, leading to various issues, such as color cast, low contrast and low visibility. Although many traditional methods have been used to address these issues, they usually lack robustness in diverse underwater scenes. On the other hand, deep learning techniques struggle to generalize to unseen images, due to the challenge of learning the complicated degradation process. Inspired by the success achieved by the Retinex-based methods, we decompose the Underwater Image Enhancement (UIE) task into two consecutive procedures, including color correction and visibility enhancement, and introduce a novel deep Color-Corrected Multi-scale Retinex Network (CCMSR-Net). With regard to the two procedures, this network comprises a Color Correction subnetwork (CC-Net) and a Multi-scale Retinex subnetwork (MSR-Net), which are built on top of the Hybrid Convolution-Axial Attention Block (HCAAB) that we design. Thanks to this block, the CCMSR-Net is able to efficiently capture local characteristics and the global context. Experimental results show that the CCMSR-Net outperforms, or at least performs comparably to, 11 baselines across five test sets. We believe that these promising results are due to the effective combination of color correction methods and the multi-scale Retinex model, achieved by jointly exploiting Convolutional Neural Networks (CNNs) and Transformers.</p
Automatic Chinese Postal Address Block Location Using Proximity Descriptors and Cooperative Profit Random Forests
Locating the destination address block is key to automated sorting of mails. Due to the characteristics of Chinese envelopes used in mainland China, we here exploit proximity cues in order to describe the investigated regions on envelopes. We propose two proximity descriptors encoding spatial distributions of the connected components obtained from the binary envelope images. To locate the destination address block, these descriptors are used together with cooperative profit random forests (CPRFs). Experimental results show that the proposed proximity descriptors are superior to two component descriptors, which only exploit the shape characteristics of the individual components, and the CPRF classifier produces higher recall values than seven state-of-the-art classifiers. These promising results are due to the fact that the proposed descriptors encode the proximity characteristics of the binary envelope images, and the CPRF classifier uses an effective tree node split approach
A Perception-Inspired Deep Learning Framework for Predicting Perceptual Texture Similarity
Similarity learning plays a fundamental role in the
fields of multimedia retrieval and pattern recognition. Prediction
of perceptual similarity is a challenging task as in most cases
we lack human labeled ground-truth data and robust models to
mimic human visual perception. Although in the literature, some
studies have been dedicated to similarity learning, they mainly
focus on the evaluation of whether or not two images are similar,
rather than prediction of perceptual similarity which is consistent
with human perception. Inspired by the human visual perception
mechanism, we here propose a novel framework in order to
predict perceptual similarity between two texture images. Our
proposed framework is built on the top of Convolutional Neural
Networks (CNNs). The proposed framework considers both
powerful features and perceptual characteristics of contours
extracted from the images. The similarity value is computed by
aggregating resemblances between the corresponding convolutional layer activations of the two texture maps. Experimental
results show that the predicted similarity values are consistent
with the human-perceived similarity data