9 research outputs found
Robust object representation by boosting-like deep learning architecture
This paper presents a new deep learning architecture for robust object representation, aiming at efficiently combining the proposed synchronized multi-stage feature (SMF) and a boosting-like algorithm. The SMF structure can capture a variety of characteristics from the inputting object based on the fusion of the handcraft features and deep learned features. With the proposed boosting-like algorithm, we can obtain more convergence stability on training multi-layer network by using the boosted samples. We show the generalization of our object representation architecture by applying it to undertake various tasks, i.e. pedestrian detection and action recognition. Our approach achieves 15.89% and 3.85% reduction in the average miss rate compared with ACF and JointDeep on the largest Caltech dataset, and acquires competitive results on the MSRAction3D dataset
Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification
Designing discriminative powerful texture features robust to realistic
imaging conditions is a challenging computer vision problem with many
applications, including material recognition and analysis of satellite or
aerial imagery. In the past, most texture description approaches were based on
dense orderless statistical distribution of local features. However, most
recent approaches to texture recognition and remote sensing scene
classification are based on Convolutional Neural Networks (CNNs). The d facto
practice when learning these CNN models is to use RGB patches as input with
training performed on large amounts of labeled data (ImageNet). In this paper,
we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained
using mapped coded images with explicit texture information provide
complementary information to the standard RGB deep models. Additionally, two
deep architectures, namely early and late fusion, are investigated to combine
the texture and color information. To the best of our knowledge, we are the
first to investigate Binary Patterns encoded CNNs and different deep network
fusion architectures for texture recognition and remote sensing scene
classification. We perform comprehensive experiments on four texture
recognition datasets and four remote sensing scene classification benchmarks:
UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with
7 categories and the recently introduced large scale aerial image dataset (AID)
with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary
information to standard RGB deep model of the same network architecture. Our
late fusion TEX-Net architecture always improves the overall performance
compared to the standard RGB network on both recognition problems. Our final
combination outperforms the state-of-the-art without employing fine-tuning or
ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin
Dynamic fast local Laplacian completed local ternary pattern (dynamic FLapCLTP) for face recognition
Today, face recognition has become one of the typical biometric authentication systems used for high security. Some systems may use face recognition to enhance their security and provide high protection level. Feature extraction is considered to be one of the most important steps in face recognition systems. The important and interesting parts of the image in feature extraction are represented as a compact feature vector. Many features, such as texture, colour and shape, have been proposed in the image processing fields. These features can also be classified globally or locally depending on the image extraction area. Texture descriptors have recently played a crucial role as local descriptors. Different types of texture descriptors, such as local binary pattern (LBP), local ternary pattern (LTP), completed local binary pattern (CLBP) and completed local ternary pattern (CLTP), have been proposed and utilised for face recognition tasks. All these texture features have achieved good performance in terms of recognition accuracy. Although the LBP performed well in different tasks, it has two limitations. LBP is sensitive to noise and occasionally fails to clearly distinguish between two different texture patterns with the same LBP encoding code. Most of the texture descriptors inherited these limitations from LBP. CLTP is proposed to overcome the limitations of LBP. CLTP performed well with different image processing tasks, such as image classification and face recognition. However, CLTP suffers from two limitations that may affect its performance in these tasks: the fixed value of the threshold value that is used during the CLTP extraction process regardless of the type of dataset or system and the longer length of the CLTP histogram than that of previous descriptors. This study focused on handling the first limitation, which is the threshold selection. Firstly, a new texture descriptor is proposed by integrating the fast-local Laplacian filter and the CLTP descriptor, namely, fast-local Laplacian CLTP (FLapCLTP). The fast-local Laplacian filter can help in increasing the performance of the CLTP due to its extensive detail enhancements and tone mapping; this contribution is handled by the constant threshold value used in CLTP. A dynamic FLapCLTP is then proposed to address the aforementioned issue. Instead of using a fixed threshold value with all datasets, a dynamic value is selected based on the image pixel values. Therefore, each different texture pattern has different threshold values to extract FLapCLTP from the pattern. This dynamic value is automatically selected according to the centre value of the texture pattern. Therefore, a dynamic FLapCLTP is proposed in this study. Finally, the proposed FLapCLTP and dynamic FLapCLTP are evaluated for facial recognition systems using ORL Faces, Sheffield Face, Collection Facial Images, Georgia Tech Face, Caltech Pedestrian Faces 1999, JAFFE, FEI Face and YALE datasets. The results showed the priority of the proposed texture compared with previous texture descriptors. The dynamic FLapCLTP achieved the highest recognition accuracy rates with values of 100%, 99.96%, 99.75%, 99.69%, 94.86%, 90.33%, 86.86% and 82.43% using UMIST, Collection Facial Images, JAFFE, ORL, Georgia Tech, YALE, Caltech 1999 and FEI datasets, respectively
Remote Sensing Image Scene Classification: Benchmark and State of the Art
Remote sensing image scene classification plays an important role in a wide
range of applications and hence has been receiving remarkable attention. During
the past years, significant efforts have been made to develop various datasets
or present a variety of approaches for scene classification from remote sensing
images. However, a systematic review of the literature concerning datasets and
methods for scene classification is still lacking. In addition, almost all
existing datasets have a number of limitations, including the small scale of
scene classes and the image numbers, the lack of image variations and
diversity, and the saturation of accuracy. These limitations severely limit the
development of new approaches especially deep learning-based methods. This
paper first provides a comprehensive review of the recent progress. Then, we
propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly
available benchmark for REmote Sensing Image Scene Classification (RESISC),
created by Northwestern Polytechnical University (NWPU). This dataset contains
31,500 images, covering 45 scene classes with 700 images in each class. The
proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total
image number, (ii) holds big variations in translation, spatial resolution,
viewpoint, object pose, illumination, background, and occlusion, and (iii) has
high within-class diversity and between-class similarity. The creation of this
dataset will enable the community to develop and evaluate various data-driven
algorithms. Finally, several representative methods are evaluated using the
proposed dataset and the results are reported as a useful baseline for future
research.Comment: This manuscript is the accepted version for Proceedings of the IEE
Very High Resolution (VHR) Satellite Imagery: Processing and Applications
Recently, growing interest in the use of remote sensing imagery has appeared to provide synoptic maps of water quality parameters in coastal and inner water ecosystems;, monitoring of complex land ecosystems for biodiversity conservation; precision agriculture for the management of soils, crops, and pests; urban planning; disaster monitoring, etc. However, for these maps to achieve their full potential, it is important to engage in periodic monitoring and analysis of multi-temporal changes. In this context, very high resolution (VHR) satellite-based optical, infrared, and radar imaging instruments provide reliable information to implement spatially-based conservation actions. Moreover, they enable observations of parameters of our environment at greater broader spatial and finer temporal scales than those allowed through field observation alone. In this sense, recent very high resolution satellite technologies and image processing algorithms present the opportunity to develop quantitative techniques that have the potential to improve upon traditional techniques in terms of cost, mapping fidelity, and objectivity. Typical applications include multi-temporal classification, recognition and tracking of specific patterns, multisensor data fusion, analysis of land/marine ecosystem processes and environment monitoring, etc. This book aims to collect new developments, methodologies, and applications of very high resolution satellite data for remote sensing. The works selected provide to the research community the most recent advances on all aspects of VHR satellite remote sensing
Deep learning for land cover and land use classification
Recent advances in sensor technologies have witnessed a vast amount of very fine spatial resolution (VFSR) remotely sensed imagery being collected on a daily basis. These VFSR images present fine spatial details that are spectrally and spatially complicated, thus posing huge challenges in automatic land cover (LC) and land use (LU) classification. Deep learning reignited the pursuit of artificial intelligence towards a general purpose machine to be able to perform any human-related tasks in an automated fashion. This is largely driven by the wave of excitement in deep machine learning to model the high-level abstractions through hierarchical feature representations without human-designed features or rules, which demonstrates great potential in identifying and characterising LC and LU patterns from VFSR imagery. In this thesis, a set of novel deep learning methods are developed for LC and LU image classification based on the deep convolutional neural networks (CNN) as an example. Several difficulties, however, are encountered when trying to apply the standard pixel-wise CNN for LC and LU classification using VFSR images, including geometric distortions, boundary uncertainties and huge computational redundancy. These technical challenges for LC classification were solved either using rule-based decision fusion or through uncertainty modelling using rough set theory. For land use, an object-based CNN method was proposed, in which each segmented object (a group of homogeneous pixels) was sampled and predicted by CNN with both within-object and between-object information. LU was, thus, classified with high accuracy and efficiency. Both LC and LU formulate a hierarchical ontology at the same geographical space, and such representations are modelled by their joint distribution, in which LC and LU are classified simultaneously through iteration. These developed deep learning techniques achieved by far the highest classification accuracy for both LC and LU, up to around 90% accuracy, about 5% higher than the existing deep learning methods, and 10% greater than traditional pixel-based and object-based approaches. This research made a significant contribution in LC and LU classification through deep learning based innovations, and has great potential utility in a wide range of geospatial applications