700 research outputs found

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Full text link
    Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.Comment: This manuscript is the accepted version for Proceedings of the IEE

    Digital analysis of paintings

    Get PDF

    Modelling the human perception of shape-from-shading

    Get PDF
    Shading conveys information on 3-D shape and the process of recovering this information is called shape-from-shading (SFS). This thesis divides the process of human SFS into two functional sub-units (luminance disambiguation and shape computation) and studies them individually. Based on results of a series of psychophysical experiments it is proposed that the interaction between first- and second-order channels plays an important role in disambiguating luminance. Based on this idea, two versions of a biologically plausible model are developed to explain the human performances observed here and elsewhere. An algorithm sharing the same idea is also developed as a solution to the problem of intrinsic image decomposition in the field of image processing. With regard to the shape computation unit, a link between luminance variations and estimated surface norms is identified by testing participants on simple gratings with several different luminance profiles. This methodology is unconventional but can be justified in the light of past studies of human SFS. Finally a computational algorithm for SFS containing two distinct operating modes is proposed. This algorithm is broadly consistent with the known psychophysics on human SFS

    Understanding Heterogeneous EO Datasets: A Framework for Semantic Representations

    Get PDF
    Earth observation (EO) has become a valuable source of comprehensive, reliable, and persistent information for a wide number of applications. However, dealing with the complexity of land cover is sometimes difficult, as the variety of EO sensors reflects in the multitude of details recorded in several types of image data. Their properties dictate the category and nature of the perceptible land structures. The data heterogeneity hampers proper understanding, preventing the definition of universal procedures for content exploitation. The main shortcomings are due to the different human and sensor perception on objects, as well as to the lack of coincidence between visual elements and similarities obtained by computation. In order to bridge these sensory and semantic gaps, the paper presents a compound framework for EO image information extraction. The proposed approach acts like a common ground between the user's understanding, who is visually shortsighted to the visible domain, and the machines numerical interpretation of a much wider information. A hierarchical data representation is considered. At first, basic elements are automatically computed. Then, users can enforce their judgement on the data processing results until semantic structures are revealed. This procedure completes a user-machine knowledge transfer. The interaction is formalized as a dialogue, where communication is determined by a set of parameters guiding the computational process at each level of representation. The purpose is to maintain the data-driven observable connected to the level of semantics and to human awareness. The proposed concept offers flexibility and interoperability to users, allowing them to generate those results that best fit their application scenario. The experiments performed on different satellite images demonstrate the ability to increase the performances in case of semantic annotation by adjusting a set of parameters to the particularities of the analyzed data

    Dynamic fast local Laplacian completed local ternary pattern (dynamic FLapCLTP) for face recognition

    Get PDF
    Today, face recognition has become one of the typical biometric authentication systems used for high security. Some systems may use face recognition to enhance their security and provide high protection level. Feature extraction is considered to be one of the most important steps in face recognition systems. The important and interesting parts of the image in feature extraction are represented as a compact feature vector. Many features, such as texture, colour and shape, have been proposed in the image processing fields. These features can also be classified globally or locally depending on the image extraction area. Texture descriptors have recently played a crucial role as local descriptors. Different types of texture descriptors, such as local binary pattern (LBP), local ternary pattern (LTP), completed local binary pattern (CLBP) and completed local ternary pattern (CLTP), have been proposed and utilised for face recognition tasks. All these texture features have achieved good performance in terms of recognition accuracy. Although the LBP performed well in different tasks, it has two limitations. LBP is sensitive to noise and occasionally fails to clearly distinguish between two different texture patterns with the same LBP encoding code. Most of the texture descriptors inherited these limitations from LBP. CLTP is proposed to overcome the limitations of LBP. CLTP performed well with different image processing tasks, such as image classification and face recognition. However, CLTP suffers from two limitations that may affect its performance in these tasks: the fixed value of the threshold value that is used during the CLTP extraction process regardless of the type of dataset or system and the longer length of the CLTP histogram than that of previous descriptors. This study focused on handling the first limitation, which is the threshold selection. Firstly, a new texture descriptor is proposed by integrating the fast-local Laplacian filter and the CLTP descriptor, namely, fast-local Laplacian CLTP (FLapCLTP). The fast-local Laplacian filter can help in increasing the performance of the CLTP due to its extensive detail enhancements and tone mapping; this contribution is handled by the constant threshold value used in CLTP. A dynamic FLapCLTP is then proposed to address the aforementioned issue. Instead of using a fixed threshold value with all datasets, a dynamic value is selected based on the image pixel values. Therefore, each different texture pattern has different threshold values to extract FLapCLTP from the pattern. This dynamic value is automatically selected according to the centre value of the texture pattern. Therefore, a dynamic FLapCLTP is proposed in this study. Finally, the proposed FLapCLTP and dynamic FLapCLTP are evaluated for facial recognition systems using ORL Faces, Sheffield Face, Collection Facial Images, Georgia Tech Face, Caltech Pedestrian Faces 1999, JAFFE, FEI Face and YALE datasets. The results showed the priority of the proposed texture compared with previous texture descriptors. The dynamic FLapCLTP achieved the highest recognition accuracy rates with values of 100%, 99.96%, 99.75%, 99.69%, 94.86%, 90.33%, 86.86% and 82.43% using UMIST, Collection Facial Images, JAFFE, ORL, Georgia Tech, YALE, Caltech 1999 and FEI datasets, respectively

    Advances in Stereo Vision

    Get PDF
    Stereopsis is a vision process whose geometrical foundation has been known for a long time, ever since the experiments by Wheatstone, in the 19th century. Nevertheless, its inner workings in biological organisms, as well as its emulation by computer systems, have proven elusive, and stereo vision remains a very active and challenging area of research nowadays. In this volume we have attempted to present a limited but relevant sample of the work being carried out in stereo vision, covering significant aspects both from the applied and from the theoretical standpoints
    • …
    corecore