57 research outputs found
Graph Embedding via High Dimensional Model Representation for Hyperspectral Images
Learning the manifold structure of remote sensing images is of paramount
relevance for modeling and understanding processes, as well as to encapsulate
the high dimensionality in a reduced set of informative features for subsequent
classification, regression, or unmixing. Manifold learning methods have shown
excellent performance to deal with hyperspectral image (HSI) analysis but,
unless specifically designed, they cannot provide an explicit embedding map
readily applicable to out-of-sample data. A common assumption to deal with the
problem is that the transformation between the high-dimensional input space and
the (typically low) latent space is linear. This is a particularly strong
assumption, especially when dealing with hyperspectral images due to the
well-known nonlinear nature of the data. To address this problem, a manifold
learning method based on High Dimensional Model Representation (HDMR) is
proposed, which enables to present a nonlinear embedding function to project
out-of-sample samples into the latent space. The proposed method is compared to
manifold learning methods along with its linear counterparts and achieves
promising performance in terms of classification accuracy of a representative
set of hyperspectral images.Comment: This is an accepted version of work to be published in the IEEE
Transactions on Geoscience and Remote Sensing. 11 page
SAGA: Sparse And Geometry-Aware non-negative matrix factorization through non-linear local embedding
International audienceThis paper presents a new non-negative matrix factorization technique which (1) allows the decomposition of the original data on multiple latent factors accounting for the geometrical structure of the manifold embedding the data; (2) provides an optimal representation with a controllable level of sparsity; (3) has an overall linear complexity allowing handling in tractable time large and high dimensional datasets. It operates by coding the data with respect to local neighbors with non-linear weights. This locality is obtained as a consequence of the simultaneous sparsity and convexity constraints. Our method is demonstrated over several experiments, including a feature extraction and classification task, where it achieves better performances than the state-of-the-art factorization methods, with a shorter computational time
Graph-based Semi-supervised Learning: Algorithms and Applications.
114 p.Graph-based semi-supervised learning have attracted large numbers of researchers and it is an important part of semi-supervised learning. Graph construction and semi-supervised embedding are two main steps in graph-based semi-supervised learning algorithms. In this thesis, we proposed two graph construction algorithms and two semi-supervised embedding algorithms. The main work of this thesis is summarized as follows:1. A new graph construction algorithm named Graph construction based on self-representativeness and Laplacian smoothness (SRLS) and several variants are proposed. Researches show that the coefficients obtained by data representation algorithms reflect the similarity between data samples and can be considered as a measurement of the similarity. This kind of measurement can be used for the weights of the edges between data samples in graph construction. Each column of the coefficient matrix obtained by data self-representation algorithms can be regarded as a new representation of original data. The new representations should have common features as the original data samples. Thus, if two data samples are close to each other in the original space, the corresponding representations should be highly similar. This constraint is called Laplacian smoothness.SRLS graph is based on l2-norm minimized data self-representation and Laplacian smoothness. Since the representation matrix obtained by l2 minimization is dense, a two phrase SRLS method (TPSRLS) is proposed to increase the sparsity of graph matrix. By extending the linear space to Hilbert space, two kernelized versions of SRLS are proposed. Besides, a direct solution to kernelized SRLS algorithm is also introduced.2. A new sparse graph construction algorithm named Sparse graph with Laplacian smoothness (SGLS) and several variants are proposed. SGLS graph algorithm is based on sparse representation and use Laplacian smoothness as a constraint (SGLS). A kernelized version of the SGLS algorithm and a direct solution to kernelized SGLS algorithm are also proposed. 3. SPP is a successful unsupervised learning method. To extend SPP to a semi-supervised embedding method, we introduce the idea of in-class constraints in CGE into SPP and propose a new semi-supervised method for data embedding named Constrained Sparsity Preserving Embedding (CSPE).4. The weakness of CSPE is that it cannot handle the new coming samples which means a cascade regression should be performed after the non-linear mapping is obtained by CSPE over the whole training samples. Inspired by FME, we add a regression term in the objective function to obtain an approximate linear projection simultaneously when non-linear embedding is estimated and proposed Flexible Constrained Sparsity Preserving Embedding (FCSPE).Extensive experiments on several datasets (including facial images, handwriting digits images and objects images) prove that the proposed algorithms can improve the state-of-the-art results
Graph-based Semi-supervised Learning: Algorithms and Applications.
114 p.Graph-based semi-supervised learning have attracted large numbers of researchers and it is an important part of semi-supervised learning. Graph construction and semi-supervised embedding are two main steps in graph-based semi-supervised learning algorithms. In this thesis, we proposed two graph construction algorithms and two semi-supervised embedding algorithms. The main work of this thesis is summarized as follows:1. A new graph construction algorithm named Graph construction based on self-representativeness and Laplacian smoothness (SRLS) and several variants are proposed. Researches show that the coefficients obtained by data representation algorithms reflect the similarity between data samples and can be considered as a measurement of the similarity. This kind of measurement can be used for the weights of the edges between data samples in graph construction. Each column of the coefficient matrix obtained by data self-representation algorithms can be regarded as a new representation of original data. The new representations should have common features as the original data samples. Thus, if two data samples are close to each other in the original space, the corresponding representations should be highly similar. This constraint is called Laplacian smoothness.SRLS graph is based on l2-norm minimized data self-representation and Laplacian smoothness. Since the representation matrix obtained by l2 minimization is dense, a two phrase SRLS method (TPSRLS) is proposed to increase the sparsity of graph matrix. By extending the linear space to Hilbert space, two kernelized versions of SRLS are proposed. Besides, a direct solution to kernelized SRLS algorithm is also introduced.2. A new sparse graph construction algorithm named Sparse graph with Laplacian smoothness (SGLS) and several variants are proposed. SGLS graph algorithm is based on sparse representation and use Laplacian smoothness as a constraint (SGLS). A kernelized version of the SGLS algorithm and a direct solution to kernelized SGLS algorithm are also proposed. 3. SPP is a successful unsupervised learning method. To extend SPP to a semi-supervised embedding method, we introduce the idea of in-class constraints in CGE into SPP and propose a new semi-supervised method for data embedding named Constrained Sparsity Preserving Embedding (CSPE).4. The weakness of CSPE is that it cannot handle the new coming samples which means a cascade regression should be performed after the non-linear mapping is obtained by CSPE over the whole training samples. Inspired by FME, we add a regression term in the objective function to obtain an approximate linear projection simultaneously when non-linear embedding is estimated and proposed Flexible Constrained Sparsity Preserving Embedding (FCSPE).Extensive experiments on several datasets (including facial images, handwriting digits images and objects images) prove that the proposed algorithms can improve the state-of-the-art results
Interpretable Hyperspectral AI: When Non-Convex Modeling meets Hyperspectral Remote Sensing
Hyperspectral imaging, also known as image spectrometry, is a landmark
technique in geoscience and remote sensing (RS). In the past decade, enormous
efforts have been made to process and analyze these hyperspectral (HS) products
mainly by means of seasoned experts. However, with the ever-growing volume of
data, the bulk of costs in manpower and material resources poses new challenges
on reducing the burden of manual labor and improving efficiency. For this
reason, it is, therefore, urgent to develop more intelligent and automatic
approaches for various HS RS applications. Machine learning (ML) tools with
convex optimization have successfully undertaken the tasks of numerous
artificial intelligence (AI)-related applications. However, their ability in
handling complex practical problems remains limited, particularly for HS data,
due to the effects of various spectral variabilities in the process of HS
imaging and the complexity and redundancy of higher dimensional HS signals.
Compared to the convex models, non-convex modeling, which is capable of
characterizing more complex real scenes and providing the model
interpretability technically and theoretically, has been proven to be a
feasible solution to reduce the gap between challenging HS vision tasks and
currently advanced intelligent data processing models
Nonlinear Dimensionality Reduction by Manifold Unfolding
Every second, an enormous volume of data is being gathered from various sources and stored in huge data banks. Most of the time, monitoring a data source requires several parallel measurements, which form a high-dimensional sample vector. Due to the curse of dimensionality, applying machine learning methods, that is, studying and analyzing high-dimensional data, could be difficult. The essential task of dimensionality reduction is to faithfully represent a given set of high-dimensional data samples with a few variables. The goal of this thesis is to develop and propose new techniques for handling high-dimensional data, in order to address contemporary demand in machine learning applications.
Most prominent nonlinear dimensionality reduction methods do not explicitly provide a way to handle out-of-samples. The starting point of this thesis is a nonlinear technique, called Embedding by Affine Transformations (EAT), which reduces the dimensionality of out-of-sample data as well. In this method, a convex optimization is solved for estimating a transformation between the high-dimensional input space and the low-dimensional embedding space. To the best of our knowledge, EAT is the only distance-preserving method for nonlinear dimensionality reduction capable of handling out-of-samples.
The second method that we propose is TesseraMap. This method is a scalable extension of EAT. Conceptually, TesseraMap partitions the underlying manifold of data into a set of tesserae and then unfolds it by constructing a tessellation in a low-dimensional subspace of the embedding space. Crucially, the desired tessellation is obtained through solving a small semidefinite program; therefore, this method can efficiently handle tens of thousands of data points in a short time.
The final outcome of this thesis is a novel method in dimensionality reduction called Isometric Patch Alignment (IPA). Intuitively speaking, IPA first considers a number of overlapping flat patches, which cover the underlying manifold of the high-dimensional input data. Then, IPA rearranges the patches and stitches the neighbors together on their overlapping parts. We prove that stitching two neighboring patches aligns them together; thereby, IPA unfolds the underlying manifold of data. Although this method and TesseraMap have similar approaches, IPA is more scalable; it embeds one million data points in only a few minutes. More importantly, unlike EAT and TesseraMap, which unfold the underlying manifold by stretching it, IPA constructs the unfolded manifold through patch alignment. We show this novel approach is advantageous in many cases. In addition, compared to the other well-known dimensionality reduction methods, IPA has several important characteristics; for example, it is noise tolerant, it handles non-uniform samples, and it can embed non-convex manifolds properly.
In addition to these three dimensionality reduction methods, we propose a method for subspace clustering called Low-dimensional Localized Clustering (LDLC). In subspace clustering, data is partitioned into clusters, such that the points of each cluster lie close to a low-dimensional subspace. The unique property of LDLC is that it produces localized clusters on the underlying manifold of data. By conducting several experiments, we show this property is an asset in many machine learning tasks. This method can also be used for local dimensionality reduction. Moreover, LDLC is a suitable tool for forming the tesserae in TesseraMap, and also for creating the patches in IPA.1 yea
- …