9 research outputs found

    Robust object representation by boosting-like deep learning architecture

    Get PDF
    This paper presents a new deep learning architecture for robust object representation, aiming at efficiently combining the proposed synchronized multi-stage feature (SMF) and a boosting-like algorithm. The SMF structure can capture a variety of characteristics from the inputting object based on the fusion of the handcraft features and deep learned features. With the proposed boosting-like algorithm, we can obtain more convergence stability on training multi-layer network by using the boosted samples. We show the generalization of our object representation architecture by applying it to undertake various tasks, i.e. pedestrian detection and action recognition. Our approach achieves 15.89% and 3.85% reduction in the average miss rate compared with ACF and JointDeep on the largest Caltech dataset, and acquires competitive results on the MSRAction3D dataset

    Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

    Full text link
    Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The d facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Our final combination outperforms the state-of-the-art without employing fine-tuning or ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin

    Dynamic fast local Laplacian completed local ternary pattern (dynamic FLapCLTP) for face recognition

    Get PDF
    Today, face recognition has become one of the typical biometric authentication systems used for high security. Some systems may use face recognition to enhance their security and provide high protection level. Feature extraction is considered to be one of the most important steps in face recognition systems. The important and interesting parts of the image in feature extraction are represented as a compact feature vector. Many features, such as texture, colour and shape, have been proposed in the image processing fields. These features can also be classified globally or locally depending on the image extraction area. Texture descriptors have recently played a crucial role as local descriptors. Different types of texture descriptors, such as local binary pattern (LBP), local ternary pattern (LTP), completed local binary pattern (CLBP) and completed local ternary pattern (CLTP), have been proposed and utilised for face recognition tasks. All these texture features have achieved good performance in terms of recognition accuracy. Although the LBP performed well in different tasks, it has two limitations. LBP is sensitive to noise and occasionally fails to clearly distinguish between two different texture patterns with the same LBP encoding code. Most of the texture descriptors inherited these limitations from LBP. CLTP is proposed to overcome the limitations of LBP. CLTP performed well with different image processing tasks, such as image classification and face recognition. However, CLTP suffers from two limitations that may affect its performance in these tasks: the fixed value of the threshold value that is used during the CLTP extraction process regardless of the type of dataset or system and the longer length of the CLTP histogram than that of previous descriptors. This study focused on handling the first limitation, which is the threshold selection. Firstly, a new texture descriptor is proposed by integrating the fast-local Laplacian filter and the CLTP descriptor, namely, fast-local Laplacian CLTP (FLapCLTP). The fast-local Laplacian filter can help in increasing the performance of the CLTP due to its extensive detail enhancements and tone mapping; this contribution is handled by the constant threshold value used in CLTP. A dynamic FLapCLTP is then proposed to address the aforementioned issue. Instead of using a fixed threshold value with all datasets, a dynamic value is selected based on the image pixel values. Therefore, each different texture pattern has different threshold values to extract FLapCLTP from the pattern. This dynamic value is automatically selected according to the centre value of the texture pattern. Therefore, a dynamic FLapCLTP is proposed in this study. Finally, the proposed FLapCLTP and dynamic FLapCLTP are evaluated for facial recognition systems using ORL Faces, Sheffield Face, Collection Facial Images, Georgia Tech Face, Caltech Pedestrian Faces 1999, JAFFE, FEI Face and YALE datasets. The results showed the priority of the proposed texture compared with previous texture descriptors. The dynamic FLapCLTP achieved the highest recognition accuracy rates with values of 100%, 99.96%, 99.75%, 99.69%, 94.86%, 90.33%, 86.86% and 82.43% using UMIST, Collection Facial Images, JAFFE, ORL, Georgia Tech, YALE, Caltech 1999 and FEI datasets, respectively

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Full text link
    Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.Comment: This manuscript is the accepted version for Proceedings of the IEE

    Very High Resolution (VHR) Satellite Imagery: Processing and Applications

    Get PDF
    Recently, growing interest in the use of remote sensing imagery has appeared to provide synoptic maps of water quality parameters in coastal and inner water ecosystems;, monitoring of complex land ecosystems for biodiversity conservation; precision agriculture for the management of soils, crops, and pests; urban planning; disaster monitoring, etc. However, for these maps to achieve their full potential, it is important to engage in periodic monitoring and analysis of multi-temporal changes. In this context, very high resolution (VHR) satellite-based optical, infrared, and radar imaging instruments provide reliable information to implement spatially-based conservation actions. Moreover, they enable observations of parameters of our environment at greater broader spatial and finer temporal scales than those allowed through field observation alone. In this sense, recent very high resolution satellite technologies and image processing algorithms present the opportunity to develop quantitative techniques that have the potential to improve upon traditional techniques in terms of cost, mapping fidelity, and objectivity. Typical applications include multi-temporal classification, recognition and tracking of specific patterns, multisensor data fusion, analysis of land/marine ecosystem processes and environment monitoring, etc. This book aims to collect new developments, methodologies, and applications of very high resolution satellite data for remote sensing. The works selected provide to the research community the most recent advances on all aspects of VHR satellite remote sensing

    Deep learning for land cover and land use classification

    Get PDF
    Recent advances in sensor technologies have witnessed a vast amount of very fine spatial resolution (VFSR) remotely sensed imagery being collected on a daily basis. These VFSR images present fine spatial details that are spectrally and spatially complicated, thus posing huge challenges in automatic land cover (LC) and land use (LU) classification. Deep learning reignited the pursuit of artificial intelligence towards a general purpose machine to be able to perform any human-related tasks in an automated fashion. This is largely driven by the wave of excitement in deep machine learning to model the high-level abstractions through hierarchical feature representations without human-designed features or rules, which demonstrates great potential in identifying and characterising LC and LU patterns from VFSR imagery. In this thesis, a set of novel deep learning methods are developed for LC and LU image classification based on the deep convolutional neural networks (CNN) as an example. Several difficulties, however, are encountered when trying to apply the standard pixel-wise CNN for LC and LU classification using VFSR images, including geometric distortions, boundary uncertainties and huge computational redundancy. These technical challenges for LC classification were solved either using rule-based decision fusion or through uncertainty modelling using rough set theory. For land use, an object-based CNN method was proposed, in which each segmented object (a group of homogeneous pixels) was sampled and predicted by CNN with both within-object and between-object information. LU was, thus, classified with high accuracy and efficiency. Both LC and LU formulate a hierarchical ontology at the same geographical space, and such representations are modelled by their joint distribution, in which LC and LU are classified simultaneously through iteration. These developed deep learning techniques achieved by far the highest classification accuracy for both LC and LU, up to around 90% accuracy, about 5% higher than the existing deep learning methods, and 10% greater than traditional pixel-based and object-based approaches. This research made a significant contribution in LC and LU classification through deep learning based innovations, and has great potential utility in a wide range of geospatial applications
    corecore