700 research outputs found
Remote Sensing Image Scene Classification: Benchmark and State of the Art
Remote sensing image scene classification plays an important role in a wide
range of applications and hence has been receiving remarkable attention. During
the past years, significant efforts have been made to develop various datasets
or present a variety of approaches for scene classification from remote sensing
images. However, a systematic review of the literature concerning datasets and
methods for scene classification is still lacking. In addition, almost all
existing datasets have a number of limitations, including the small scale of
scene classes and the image numbers, the lack of image variations and
diversity, and the saturation of accuracy. These limitations severely limit the
development of new approaches especially deep learning-based methods. This
paper first provides a comprehensive review of the recent progress. Then, we
propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly
available benchmark for REmote Sensing Image Scene Classification (RESISC),
created by Northwestern Polytechnical University (NWPU). This dataset contains
31,500 images, covering 45 scene classes with 700 images in each class. The
proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total
image number, (ii) holds big variations in translation, spatial resolution,
viewpoint, object pose, illumination, background, and occlusion, and (iii) has
high within-class diversity and between-class similarity. The creation of this
dataset will enable the community to develop and evaluate various data-driven
algorithms. Finally, several representative methods are evaluated using the
proposed dataset and the results are reported as a useful baseline for future
research.Comment: This manuscript is the accepted version for Proceedings of the IEE
Modelling the human perception of shape-from-shading
Shading conveys information on 3-D shape and the process of recovering this information is called shape-from-shading (SFS). This thesis divides the process of human SFS into two functional sub-units (luminance disambiguation and shape computation) and studies them individually. Based on results of a series of psychophysical experiments it is proposed that the interaction between first- and second-order channels plays an important role in disambiguating luminance. Based on this idea, two versions of a biologically plausible model are developed to explain the human performances observed here and elsewhere. An algorithm sharing the same idea is also developed as a solution to the problem of intrinsic image decomposition in the field of image processing. With regard to the shape computation unit, a link between luminance variations and estimated surface norms is identified by testing participants on simple gratings with several different luminance profiles. This methodology is unconventional but can be justified in the light of past studies of human SFS. Finally a computational algorithm for SFS containing two distinct operating modes is proposed. This algorithm is broadly consistent with the known psychophysics on human SFS
Understanding Heterogeneous EO Datasets: A Framework for Semantic Representations
Earth observation (EO) has become a valuable source of comprehensive, reliable, and persistent
information for a wide number of applications. However, dealing with the complexity of land cover is
sometimes difficult, as the variety of EO sensors reflects in the multitude of details recorded in several types
of image data. Their properties dictate the category and nature of the perceptible land structures. The data
heterogeneity hampers proper understanding, preventing the definition of universal procedures for content
exploitation. The main shortcomings are due to the different human and sensor perception on objects, as well
as to the lack of coincidence between visual elements and similarities obtained by computation. In order to
bridge these sensory and semantic gaps, the paper presents a compound framework for EO image information
extraction. The proposed approach acts like a common ground between the user's understanding, who is
visually shortsighted to the visible domain, and the machines numerical interpretation of a much wider
information. A hierarchical data representation is considered. At first, basic elements are automatically
computed. Then, users can enforce their judgement on the data processing results until semantic structures
are revealed. This procedure completes a user-machine knowledge transfer. The interaction is formalized as
a dialogue, where communication is determined by a set of parameters guiding the computational process
at each level of representation. The purpose is to maintain the data-driven observable connected to the level
of semantics and to human awareness. The proposed concept offers flexibility and interoperability to users,
allowing them to generate those results that best fit their application scenario. The experiments performed on
different satellite images demonstrate the ability to increase the performances in case of semantic annotation
by adjusting a set of parameters to the particularities of the analyzed data
Dynamic fast local Laplacian completed local ternary pattern (dynamic FLapCLTP) for face recognition
Today, face recognition has become one of the typical biometric authentication systems used for high security. Some systems may use face recognition to enhance their security and provide high protection level. Feature extraction is considered to be one of the most important steps in face recognition systems. The important and interesting parts of the image in feature extraction are represented as a compact feature vector. Many features, such as texture, colour and shape, have been proposed in the image processing fields. These features can also be classified globally or locally depending on the image extraction area. Texture descriptors have recently played a crucial role as local descriptors. Different types of texture descriptors, such as local binary pattern (LBP), local ternary pattern (LTP), completed local binary pattern (CLBP) and completed local ternary pattern (CLTP), have been proposed and utilised for face recognition tasks. All these texture features have achieved good performance in terms of recognition accuracy. Although the LBP performed well in different tasks, it has two limitations. LBP is sensitive to noise and occasionally fails to clearly distinguish between two different texture patterns with the same LBP encoding code. Most of the texture descriptors inherited these limitations from LBP. CLTP is proposed to overcome the limitations of LBP. CLTP performed well with different image processing tasks, such as image classification and face recognition. However, CLTP suffers from two limitations that may affect its performance in these tasks: the fixed value of the threshold value that is used during the CLTP extraction process regardless of the type of dataset or system and the longer length of the CLTP histogram than that of previous descriptors. This study focused on handling the first limitation, which is the threshold selection. Firstly, a new texture descriptor is proposed by integrating the fast-local Laplacian filter and the CLTP descriptor, namely, fast-local Laplacian CLTP (FLapCLTP). The fast-local Laplacian filter can help in increasing the performance of the CLTP due to its extensive detail enhancements and tone mapping; this contribution is handled by the constant threshold value used in CLTP. A dynamic FLapCLTP is then proposed to address the aforementioned issue. Instead of using a fixed threshold value with all datasets, a dynamic value is selected based on the image pixel values. Therefore, each different texture pattern has different threshold values to extract FLapCLTP from the pattern. This dynamic value is automatically selected according to the centre value of the texture pattern. Therefore, a dynamic FLapCLTP is proposed in this study. Finally, the proposed FLapCLTP and dynamic FLapCLTP are evaluated for facial recognition systems using ORL Faces, Sheffield Face, Collection Facial Images, Georgia Tech Face, Caltech Pedestrian Faces 1999, JAFFE, FEI Face and YALE datasets. The results showed the priority of the proposed texture compared with previous texture descriptors. The dynamic FLapCLTP achieved the highest recognition accuracy rates with values of 100%, 99.96%, 99.75%, 99.69%, 94.86%, 90.33%, 86.86% and 82.43% using UMIST, Collection Facial Images, JAFFE, ORL, Georgia Tech, YALE, Caltech 1999 and FEI datasets, respectively
Advances in Stereo Vision
Stereopsis is a vision process whose geometrical foundation has been known for a long time, ever since the experiments by Wheatstone, in the 19th century. Nevertheless, its inner workings in biological organisms, as well as its emulation by computer systems, have proven elusive, and stereo vision remains a very active and challenging area of research nowadays. In this volume we have attempted to present a limited but relevant sample of the work being carried out in stereo vision, covering significant aspects both from the applied and from the theoretical standpoints
- …