83 research outputs found

    Tensor-based Hyperspectral Image Processing Methodology and its Applications in Impervious Surface and Land Cover Mapping

    Get PDF
    The emergence of hyperspectral imaging provides a new perspective for Earth observation, in addition to previously available orthophoto and multispectral imagery. This thesis focused on both the new data and new methodology in the field of hyperspectral imaging. First, the application of the future hyperspectral satellite EnMAP in impervious surface area (ISA) mapping was studied. During the search for the appropriate ISA mapping procedure for the new data, the subpixel classification based on nonnegative matrix factorization (NMF) achieved the best success. The simulated EnMAP image shows great potential in urban ISA mapping with over 85% accuracy. Unfortunately, the NMF based on the linear algebra only considers the spectral information and neglects the spatial information in the original image. The recent wide interest of applying the multilinear algebra in computer vision sheds light on this problem and raised the idea of nonnegative tensor factorization (NTF). This thesis found that the NTF has more advantages over the NMF when work with medium- rather than the high-spatial-resolution hyperspectral image. Furthermore, this thesis proposed to equip the NTF-based subpixel classification methods with the variations adopted from the NMF. By adopting the variations from the NMF, the urban ISA mapping results from the NTF were improved by ~2%. Lastly, the problem known as the curse of dimensionality is an obstacle in hyperspectral image applications. The majority of current dimension reduction (DR) methods are restricted to using only the spectral information, when the spatial information is neglected. To overcome this defect, two spectral-spatial methods: patch-based and tensor-patch-based, were thoroughly studied and compared in this thesis. To date, the popularity of the two solutions remains in computer vision studies and their applications in hyperspectral DR are limited. The patch-based and tensor-patch-based variations greatly improved the quality of dimension-reduced hyperspectral images, which then improved the land cover mapping results from them. In addition, this thesis proposed to use an improved method to produce an important intermediate result in the patch-based and tensor-patch-based DR process, which further improved the land cover mapping results

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Scalable Low-rank Matrix and Tensor Decomposition on Graphs

    Get PDF
    In many signal processing, machine learning and computer vision applications, one often has to deal with high dimensional and big datasets such as images, videos, web content, etc. The data can come in various forms, such as univariate or multivariate time series, matrices or high dimensional tensors. The goal of the data mining community is to reveal the hidden linear or non-linear structures in the datasets. Over the past couple of decades matrix factorization, owing to its intrinsic association with dimensionality reduction has been adopted as one of the key methods in this context. One can either use a single linear subspace to approximate the data (the standard Principal Component Analysis (PCA) approach) or a union of low dimensional subspaces where each data class belongs to a different subspace. In many cases, however, the low dimensional data follows some additional structure. Knowledge of such structure is beneficial, as we can use it to enhance the representativity of our models by adding structured priors. A nowadays standard way to represent pairwise affinity between objects is by using graphs. The introduction of graph-based priors to enhance matrix factorization models has recently brought them back to the highest attention of the data mining community. Representation of a signal on a graph is well motivated by the emerging field of signal processing on graphs, based on notions of spectral graph theory. The underlying assumption is that high-dimensional data samples lie on or close to a smooth low-dimensional manifold. Interestingly, the underlying manifold can be represented by its discrete proxy, i.e. a graph. A primary limitation of the state-of-the-art low-rank approximation methods is that they do not generalize for the case of non-linear low-rank structures. Furthermore, the standard low-rank extraction methods for many applications, such as low-rank and sparse decomposition, are computationally cumbersome. We argue, that for many machine learning and signal processing applications involving big data, an approximate low-rank recovery suffices. Thus, in this thesis, we present solutions to the above two limitations by presenting a new framework for scalable but approximate low-rank extraction which exploits the hidden structure in the data using the notion of graphs. First, we present a novel signal model, called `Multilinear low-rank tensors on graphs (MLRTG)' which states that a tensor can be encoded as a multilinear combination of the low-frequency graph eigenvectors, where the graphs are constructed along the various modes of the tensor. Since the graph eigenvectors have the interpretation of \textit{non-linear} embedding of a dataset on the low-dimensional manifold, we propose a method called `Graph Multilinear SVD (GMLSVD)' to recover PCA based linear subspaces from these eigenvectors. Finally, we propose a plethora of highly scalable matrix and tensor based problems for low-rank extraction which implicitly or explicitly make use of the GMLSVD framework. The core idea is to replace the expensive iterative SVD operations by updating the linear subspaces from the fixed non-linear ones via low-cost operations. We present applications in low-rank and sparse decomposition and clustering of the low-rank features to evaluate all the proposed methods. Our theoretical analysis shows that the approximation error of the proposed framework depends on the spectral properties of the graph Laplacian

    Injecting spatial priors in Earth observation with machine vision

    Get PDF
    Remote Sensing (RS) imagery with submeter resolution is becoming ubiquitous. Be it from satellites, aerial campaigns or Unmanned Aerial Vehicles, this spatial resolution allows to recognize individual objects and their parts from above. This has driven, during the last few years, a big interest in the RS community on Computer Vision (CV) methods developed for the automated understanding of natural images. A central element to the success of \CV is the use of prior information about the image generation process and the objects these images contain: neighboring pixels are likely to belong to the same object; objects of the same nature tend to look similar with independence of their location in the image; certain objects tend to occur in particular geometric configurations; etc. When using RS imagery, additional prior knowledge exists on how the images were formed, since we know roughly the geographical location of the objects, the geospatial prior, and the direction they were observed from, the overhead-view prior. This thesis explores ways of encoding these priors in CV models to improve their performance on RS imagery, with a focus on land-cover and land-use mapping.</p

    Sparse Coding with Structured Sparsity Priors and Multilayer Architecture for Image Classification

    Get PDF
    Applying sparse coding on large dataset for image classification is a long standing problem in the field of computer vision. It has been found that the sparse coding models exhibit disappointing performance on these large datasets where variability is broad and anomalies are common. Conversely, deep neural networks thrive on bountiful data. Their success has encouraged researchers to try and augment the learning capacity of traditionally shallow sparse coding methods by adding layers. Multilayer sparse coding networks are expected to combine the best of both sparsity regularizations and deep architectures. To date, however, endeavors to marry the two techniques have not achieved significant improvements over their individual counterparts. In this thesis, we first briefly review multiple structured sparsity priors as well as various supervised dictionary learning techniques with applications on hyperspectral image classification. Based on the structured sparsity priors and dictionary learning techniques, we then develop a novel multilayer sparse coding network that contains thirteen sparse coding layers. The proposed sparse coding network learns both the dictionaries and the regularization parameters simultaneously using an end-to-end supervised learning scheme. We show empirical evidence that the regularization parameters can adapt to the given training data. We also propose applying dimension reduction within sparse coding networks to dramatically reduce the output dimensionality of the sparse coding layers and mitigate computational costs. Moreover, our sparse coding network is compatible with other powerful deep learning techniques such as drop out, batch normalization and shortcut connections. Experimental results show that the proposed multilayer sparse coding network produces classification accuracy competitive with the deep neural networks while using significantly fewer parameters and layers

    Algorithms for super-resolution of images based on Sparse Representation and Manifolds

    Get PDF
    lmage super-resolution is defined as a class of techniques that enhance the spatial resolution of images. Super-resolution methods can be subdivided in single and multi image methods. This thesis focuses on developing algorithms based on mathematical theories for single image super­ resolution problems. lndeed, in arder to estimate an output image, we adopta mixed approach: i.e., we use both a dictionary of patches with sparsity constraints (typical of learning-based methods) and regularization terms (typical of reconstruction-based methods). Although the existing methods already per- form well, they do not take into account the geometry of the data to: regularize the solution, cluster data samples (samples are often clustered using algorithms with the Euclidean distance as a dissimilarity metric), learn dictionaries (they are often learned using PCA or K-SVD). Thus, state-of-the-art methods still suffer from shortcomings. In this work, we proposed three new methods to overcome these deficiencies. First, we developed SE-ASDS (a structure tensor based regularization term) in arder to improve the sharpness of edges. SE-ASDS achieves much better results than many state-of-the- art algorithms. Then, we proposed AGNN and GOC algorithms for determining a local subset of training samples from which a good local model can be computed for recon- structing a given input test sample, where we take into account the underlying geometry of the data. AGNN and GOC methods outperform spectral clustering, soft clustering, and geodesic distance based subset selection in most settings. Next, we proposed aSOB strategy which takes into account the geometry of the data and the dictionary size. The aSOB strategy outperforms both PCA and PGA methods. Finally, we combine all our methods in a unique algorithm, named G2SR. Our proposed G2SR algorithm shows better visual and quantitative results when compared to the results of state-of-the-art methods.Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorTese (Doutorado)Super-resolução de imagens é definido como urna classe de técnicas que melhora a resolução espacial de imagens. Métodos de super-resolução podem ser subdivididos em métodos para urna única imagens e métodos para múltiplas imagens. Esta tese foca no desenvolvimento de algoritmos baseados em teorias matemáticas para problemas de super-resolução de urna única imagem. Com o propósito de estimar urna imagem de saída, nós adotamos urna abordagem mista, ou seja: nós usamos dicionários de patches com restrição de esparsidade (método baseado em aprendizagem) e termos de regularização (método baseado em reconstrução). Embora os métodos existentes sejam eficientes, eles nao levam em consideração a geometria dos dados para: regularizar a solução, clusterizar os dados (dados sao frequentemente clusterizados usando algoritmos com a distancia Euclideana como métrica de dissimilaridade), aprendizado de dicionários (eles sao frequentemente treinados usando PCA ou K-SVD). Portante, os métodos do estado da arte ainda tem algumas deficiencias. Neste trabalho, nós propomos tres métodos originais para superar estas deficiencias. Primeiro, nós desenvolvemos SE-ASDS (um termo de regularização baseado em structure tensor) afim de melhorar a nitidez das bordas das imagens. SE-ASDS alcança resultados muito melhores que os algoritmos do estado da arte. Em seguida, nós propomos os algoritmos AGNN e GOC para determinar um subconjunto de amostras de treinamento a partir das quais um bom modelo local pode ser calculado para reconstruir urna dada amostra de entrada considerando a geometria dos dados. Os métodos AGNN e GOC superamos métodos spectral clustering, soft clustering e os métodos baseados em distancia geodésica na maioria dos casos. Depois, nós propomos o método aSOB que leva em consideração a geometria dos dados e o tamanho do dicionário. O método aSOB supera os métodos PCA e PGA. Finalmente, nós combinamos todos os métodos que propomos em um único algoritmo, a saber, G2SR. Nosso algoritmo G2SR mostra resultados melhores que os métodos do estado da arte em termos de PSRN, SSIM, FSIM e qualidade visual

    Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs

    Get PDF
    In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement. This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration. Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure. In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0. The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data

    Proceedings of the 2019 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    In 2019 fand wieder der jährliche Workshop des Fraunhofer IOSB und des Lehrstuhls für Interaktive Echtzeitsysteme des Karlsruher Insitut für Technologie statt. Die Doktoranden beider Institutionen präsentierten den Fortschritt ihrer Forschung in den Themen Maschinelles Lernen, Machine Vision, Messtechnik, Netzwerksicherheit und Usage Control. Die Ideen dieses Workshops sind in diesem Buch gesammelt in der Form technischer Berichte