51 research outputs found

    Unsupervised spectral sub-feature learning for hyperspectral image classification

    Get PDF
    Spectral pixel classification is one of the principal techniques used in hyperspectral image (HSI) analysis. In this article, we propose an unsupervised feature learning method for classification of hyperspectral images. The proposed method learns a dictionary of sub-feature basis representations from the spectral domain, which allows effective use of the correlated spectral data. The learned dictionary is then used in encoding convolutional samples from the hyperspectral input pixels to an expanded but sparse feature space. Expanded hyperspectral feature representations enable linear separation between object classes present in an image. To evaluate the proposed method, we performed experiments on several commonly used HSI data sets acquired at different locations and by different sensors. Our experimental results show that the proposed method outperforms other pixel-wise classification methods that make use of unsupervised feature extraction approaches. Additionally, even though our approach does not use any prior knowledge, or labelled training data to learn features, it yields either advantageous, or comparable, results in terms of classification accuracy with respect to recent semi-supervised methods

    Doctor of Philosophy

    Get PDF
    dissertationStatistical learning theory has garnered attention during the last decade because it provides the theoretical and mathematical framework for solving pattern recognition problems, such as dimensionality reduction, clustering, and shape analysis. In statis

    A Review on Non Linear Dimensionality Reduction Techniques for Face Recognition

    Get PDF
    Principal component Analysis (PCA) has gained much attention among researchers to address the pboblem of high dimensional data sets.during last decade a non-linear variantof PCA has been used to reduce the dimensions on a non linear hyperplane.This paper reviews the various Non linear techniques ,applied on real and artificial data .It is observed that Non-Linear PCA outperform in the counterpart in most cases .However exceptions are noted

    Feature extraction and classification for hyperspectral remote sensing images

    Get PDF
    Recent advances in sensor technology have led to an increased availability of hyperspectral remote sensing data at very high both spectral and spatial resolutions. Many techniques are developed to explore the spectral information and the spatial information of these data. In particular, feature extraction (FE) aimed at reducing the dimensionality of hyperspectral data while keeping as much spectral information as possible is one of methods to preserve the spectral information, while morphological profile analysis is the most popular methods used to explore the spatial information. Hyperspectral sensors collect information as a set of images represented by hundreds of spectral bands. While offering much richer spectral information than regular RGB and multispectral images, the high dimensional hyperspectal data creates also a challenge for traditional spectral data processing techniques. Conventional classification methods perform poorly on hyperspectral data due to the curse of dimensionality (i.e. the Hughes phenomenon: for a limited number of training samples, the classification accuracy decreases as the dimension increases). Classification techniques in pattern recognition typically assume that there are enough training samples available to obtain reasonably accurate class descriptions in quantitative form. However, the assumption that enough training samples are available to accurately estimate the class description is frequently not satisfied for hyperspectral remote sensing data classification, because the cost of collecting ground-truth of observed data can be considerably difficult and expensive. In contrast, techniques making accurate estimation by using only small training samples can save time and cost considerably. The small sample size problem therefore becomes a very important issue for hyperspectral image classification. Very high-resolution remotely sensed images from urban areas have recently become available. The classification of such images is challenging because urban areas often comprise a large number of different surface materials, and consequently the heterogeneity of urban images is relatively high. Moreover, different information classes can be made up of spectrally similar surface materials. Therefore, it is important to combine spectral and spatial information to improve the classification accuracy. In particular, morphological profile analysis is one of the most popular methods to explore the spatial information of the high resolution remote sensing data. When using morphological profiles (MPs) to explore the spatial information for the classification of hyperspectral data, one should consider three important issues. Firstly, classical morphological openings and closings degrade the object boundaries and deform the object shapes, while the morphological profile by reconstruction leads to some unexpected and undesirable results (e.g. over-reconstruction). Secondly, the generated MPs produce high-dimensional data, which may contain redundant information and create a new challenge for conventional classification methods, especially for the classifiers which are not robust to the Hughes phenomenon. Last but not least, linear features, which are used to construct MPs, lose too much spectral information when extracted from the original hyperspectral data. In order to overcome these problems and improve the classification results, we develop effective feature extraction algorithms and combine morphological features for the classification of hyperspectral remote sensing data. The contributions of this thesis are as follows. As the first contribution of this thesis, a novel semi-supervised local discriminant analysis (SELD) method is proposed for feature extraction in hyperspectral remote sensing imagery, with improved performance in both ill-posed and poor-posed conditions. The proposed method combines unsupervised methods (Local Linear Feature Extraction Methods (LLFE)) and supervised method (Linear Discriminant Analysis (LDA)) in a novel framework without any free parameters. The underlying idea is to design an optimal projection matrix, which preserves the local neighborhood information inferred from unlabeled samples, while simultaneously maximizing the class discrimination of the data inferred from the labeled samples. Our second contribution is the application of morphological profiles with partial reconstruction to explore the spatial information in hyperspectral remote sensing data from the urban areas. Classical morphological openings and closings degrade the object boundaries and deform the object shapes. Morphological openings and closings by reconstruction can avoid this problem, but this process leads to some undesirable effects. Objects expected to disappear at a certain scale remain present when using morphological openings and closings by reconstruction, which means that object size is often incorrectly represented. Morphological profiles with partial reconstruction improve upon both classical MPs and MPs with reconstruction. The shapes of objects are better preserved than classical MPs and the size information is preserved better than in reconstruction MPs. A novel semi-supervised feature extraction framework for dimension reduction of generated morphological profiles is the third contribution of this thesis. The morphological profiles (MPs) with different structuring elements and a range of increasing sizes of morphological operators produce high-dimensional data. These high-dimensional data may contain redundant information and create a new challenge for conventional classification methods, especially for the classifiers which are not robust to the Hughes phenomenon. To the best of our knowledge the use of semi-supervised feature extraction methods for the generated morphological profiles has not been investigated yet. The proposed generalized semi-supervised local discriminant analysis (GSELD) is an extension of SELD with a data-driven parameter. In our fourth contribution, we propose a fast iterative kernel principal component analysis (FIKPCA) to extract features from hyperspectral images. In many applications, linear FE methods, which depend on linear projection, can result in loss of nonlinear properties of the original data after reduction of dimensionality. Traditional nonlinear methods will cause some problems on storage resources and computational load. The proposed method is a kernel version of the Candid Covariance-Free Incremental Principal Component Analysis, which estimates the eigenvectors through iteration. Without performing eigen decomposition on the Gram matrix, our approach can reduce the space complexity and time complexity greatly. Our last contribution constructs MPs with partial reconstruction on nonlinear features. Traditional linear features, on which the morphological profiles usually are built, lose too much spectral information. Nonlinear features are more suitable to describe higher order complex and nonlinear distributions. In particular, kernel principal components are among the nonlinear features we used to built MPs with partial reconstruction, which led to significant improvement in terms of classification accuracies. The experimental analysis performed with the novel techniques developed in this thesis demonstrates an improvement in terms of accuracies in different fields of application when compared to other state of the art methods

    Semi-supervised classification in stratified spaces by considering non-interior points using Laplacian behavior

    Get PDF
    Manifold-based semi-supervised classifiers have attracted increasing interest in recent years. However, they suffer from over learning of locality and cannot be applied to the point cloud sampled from a stratified space. This problem is resolved in this paper by using the fact that the smoothness assump- tion must be satisfied with the interior points of the manifolds and may be violated in the non-interior points. Distinction of interior and non-interior points is based on the behavior of graph Laplacian in the �-neighborhood of the intersection points. First, this property was generalized to K NN graph representing the stratified space and then a new algorithm was proposed that penalizes the smoothness on the non- interior points of the manifolds by modifying the edge weights of the graph. Compared to some recent multi-manifold semi-supervised classifiers, the proposed method does not require neither knowing the dimensions of the manifolds nor large amount of unlabeled points to estimate the underling manifolds and does not assume similar properties for neighbors of all data points. Some experiments have been conducted in order to show that it improves the classification accuracy on a number of artificial and real benchmark data sets

    Doctor of Philosophy

    Get PDF
    dissertationMachine learning is the science of building predictive models from data that automatically improve based on past experience. To learn these models, traditional learning algorithms require labeled data. They also require that the entire dataset fits in the memory of a single machine. Labeled data are available or can be acquired for small and moderately sized datasets but curating large datasets can be prohibitively expensive. Similarly, massive datasets are usually too huge to fit into the memory of a single machine. An alternative is to distribute the dataset over multiple machines. Distributed learning, however, poses new challenges as most existing machine learning techniques are inherently sequential. Additionally, these distributed approaches have to be designed keeping in mind various resource limitations of real-world settings, prime among them being intermachine communication. With the advent of big datasets machine learning algorithms are facing new challenges. Their design is no longer limited to minimizing some loss function but, additionally, needs to consider other resources that are critical when learning at scale. In this thesis, we explore different models and measures for learning with limited resources that have a budget. What budgetary constraints are posed by modern datasets? Can we reuse or combine existing machine learning paradigms to address these challenges at scale? How does the cost metrics change when we shift to distributed models for learning? These are some of the questions that have been investigated in this thesis. The answers to these questions hold the key to addressing some of the challenges faced when learning on massive datasets. In the first part of this thesis, we present three different budgeted scenarios that deal with scarcity of labeled data and limited computational resources. The goal is to leverage transfer information from related domains to learn under budgetary constraints. Our proposed techniques comprise semisupervised transfer, online transfer and active transfer. In the second part of this thesis, we study distributed learning with limited communication. We present initial sampling based results, as well as, propose communication protocols for learning distributed linear classifiers

    Equivariance and Invariance for Robust Unsupervised and Semi-Supervised Learning

    Get PDF
    Although there is a great success of applying deep learning on a wide variety of tasks, it heavily relies on a large amount of labeled training data, which could be hard to obtain in many real scenarios. To address this problem, unsupervised and semi-supervised learning emerge to take advantage of the plenty of cheap unlabeled data to improve the model generalization. In this dissertation, we claim that equivariant and invariance are two critical criteria to approach robust unsupervised and semi-supervised learning. The idea is as follows: the features of a robust model ought to be sufficiently informative and equivariant to transformations on the input data, and the classifiers should be resilient and invariant to small perturbations on the data manifold and model parameters. Specifically, features are learnt via auto-encoding the transformations on the input data, and models are regularized through minimizing the effects of perturbations on features or model parameters. Experiments on several benchmarks show the proposed methods outperform many state-of-the-art approaches on unsupervised and semi-supervised learning, proving importance of the equivariance and invariance rules for robust feature representation learning

    Learning to Propagate Labels on Graphs: An Iterative Multitask Regression Framework for Semi-supervised Hyperspectral Dimensionality Reduction

    Get PDF
    Hyperspectral dimensionality reduction (HDR), an important preprocessing step prior to high-level data analysis, has been garnering growing attention in the remote sensing community. Although a variety of methods, both unsupervised and supervised models, have been proposed for this task, yet the discriminative ability in feature representation still remains limited due to the lack of a powerful tool that effectively exploits the labeled and unlabeled data in the HDR process. A semi-supervised HDR approach, called iterative multitask regression (IMR), is proposed in this paper to address this need. IMR aims at learning a low-dimensional subspace by jointly considering the labeled and unlabeled data, and also bridging the learned subspace with two regression tasks: labels and pseudo-labels initialized by a given classifier. More significantly, IMR dynamically propagates the labels on a learnable graph and progressively refines pseudo-labels, yielding a well-conditioned feedback system. Experiments conducted on three widely-used hyperspectral image datasets demonstrate that the dimension-reduced features learned by the proposed IMR framework with respect to classification or recognition accuracy are superior to those of related state-of-the-art HDR approaches
    corecore