25 research outputs found

    Kernel Feature Extraction Methods for Remote Sensing Data Analysis

    Get PDF
    Technological advances in the last decades have improved our capabilities of collecting and storing high data volumes. However, this makes that in some fields, such as remote sensing several problems are generated in the data processing due to the peculiar characteristics of their data. High data volume, high dimensionality, heterogeneity and their nonlinearity, make that the analysis and extraction of relevant information from these images could be a bottleneck for many real applications. The research applying image processing and machine learning techniques along with feature extraction, allows the reduction of the data dimensionality while keeps the maximum information. Therefore, developments and applications of feature extraction methodologies using these techniques have increased exponentially in remote sensing. This improves the data visualization and the knowledge discovery. Several feature extraction methods have been addressed in the literature depending on the data availability, which can be classified in supervised, semisupervised and unsupervised. In particular, feature extraction can use in combination with kernel methods (nonlinear). The process for obtaining a space that keeps greater information content is facilitated by this combination. One of the most important properties of the combination is that can be directly used for general tasks including classification, regression, clustering, ranking, compression, or data visualization. In this Thesis, we address the problems of different nonlinear feature extraction approaches based on kernel methods for remote sensing data analysis. Several improvements to the current feature extraction methods are proposed to transform the data in order to make high dimensional data tasks easier, such as classification or biophysical parameter estimation. This Thesis focus on three main objectives to reach these improvements in the current feature extraction methods: The first objective is to include invariances into supervised kernel feature extraction methods. Throughout these invariances it is possible to generate virtual samples that help to mitigate the problem of the reduced number of samples in supervised methods. The proposed algorithm is a simple method that essentially generates new (synthetic) training samples from available labeled samples. These samples along with original samples should be used in feature extraction methods obtaining more independent features between them that without virtual samples. The introduction of prior knowledge by means of the virtual samples could obtain classification and biophysical parameter estimation methods more robust than without them. The second objective is to use the generative kernels, i.e. probabilistic kernels, that directly learn by means of clustering techniques from original data by finding local-to-global similarities along the manifold. The proposed kernel is useful for general feature extraction purposes. Furthermore, the kernel attempts to improve the current methods because the kernel not only contains labeled data information but also uses the unlabeled information of the manifold. Moreover, the proposed kernel is parameter free in contrast with the parameterized functions such as, the radial basis function (RBF). Using probabilistic kernels is sought to obtain new unsupervised and semisupervised methods in order to reduce the number and cost of labeled data in remote sensing. Third objective is to develop new kernel feature extraction methods for improving the features obtained by the current methods. Optimizing the functional could obtain improvements in new algorithm. For instance, the Optimized Kernel Entropy Component Analysis (OKECA) method. The method is based on the Independent Component Analysis (ICA) framework resulting more efficient than the standard Kernel Entropy Component Analysis (KECA) method in terms of dimensionality reduction. In this Thesis, the methods are focused on remote sensing data analysis. Nevertheless, feature extraction methods are used to analyze data of several research fields whereas data are multidimensional. For these reasons, the results are illustrated into experimental sequence. First, the projections are analyzed by means of Toy examples. The algorithms are tested through standard databases with supervised information to proceed to the last step, the analysis of remote sensing images by the proposed methods

    Introducing co-clustering for hyperspectral image analysis

    Get PDF
    This work introduces the use of co-clustering for hyperspectral image analysis. Co-clustering is able to simultaneously group samples (rows) and spectral bands (columns). This results in blocks, which do not only share spectral information (classical one way clustering) but also share sample information. Here, we propose using a co-clustering algorithm based on Information Theory - the optimal co-clustering is obtaining minimizing the loss of information between the original and the co-clustered images. A hyperspectral image (160000 samples and 40 bands) is used to illustrate this study. This image was clustered into 150 groups (50 groups of samples and 3 spectral groups). After that, blocks of the spectral groups was independently classified to assess the effectiveness of the co-clustering approach for hyperspectral band selection applications. Furthermore, the results were also compared with state-of-art methods based on morphological profiles, and the covariance matrix of the original hyperspectral image. Good results were achieved, showing the effectiveness of the Co-clustering approach for hyperspectral images in spatial-spectral classification and band selection applications

    Development and analysis of spring plant phenology products: 36 years of 1-km grids over the conterminous US

    Get PDF
    Time series of phenological products provide information on the timings of recurrent biological events and on their temporal trends. This information is key to studying the impacts of climate change on our planet as well as for managing natural resources and agricultural production. Here we develop and analyze new long term phenological products: 1 km grids of the Extended Spring Indices (SI-x) over the conterminous United States from 1980 to 2015. These new products (based on Daymet daily temperature grids and created by using cloud computing) allow the analysis of two primary variables (first leaf and first bloom) and two derivative products (Damage Index and Last Freeze Day) at a much finer spatial resolution than previous gridded or interpolated products. Furthermore, our products provide enough temporal depth to reliably analyze trends and changes in the timing of spring arrival at continental scales. Validation results confirm that our products largely agree with lilac and honeysuckle leaf and flowering onset observations. The spatial analysis shows a significantly delayed spring onset in the northern US whereas in the western and the Great Lakes region, spring onset advances. The mean temporal variabilities of the indices were analyzed for the nine major climatic regions of the US and results showed a clear division into three main groups: early, average and late spring onset. Finally, the region belonging to each group was mapped. These examples show the potential of our four phenological products to improve understanding of the responses of ecosystems to a changing climat

    CGC: a scalable Python package for co- and tri-clustering of geodata cubes

    Get PDF
    Clustering Geo-Data Cubes (CGC) is a Python package to perform clustering analysis for multidimensional geospatial data. The included tools allow the user to efficiently run tasks in parallel on local and distributed systems

    An evaluation of Guided Regularized Random Forest for classification and regression tasks in remote sensing

    Get PDF
    New Earth observation missions and technologies are delivering large amounts of data. Processing this data requires developing and evaluating novel dimensionality reduction approaches to identify the most informative features for classification and regression tasks. Here we present an exhaustive evaluation of Guided Regularized Random Forest (GRRF), a feature selection method based on Random Forest. GRRF does not require fixing a priori the number of features to be selected or setting a threshold of the feature importance. Moreover, the use of regularization ensures that features selected by GRRF are non-redundant and representative. Our experiments based on various kinds of remote sensing images, show that GRRF selected features provides similar results to those obtained when using all the available features. However, the comparison between GRRF and standard random forest features shows substantial differences: in classification, the mean overall accuracy increases by almost 6% and, in regression, the decrease in RMSE almost reaches 2%. These results demonstrate the potential of GRRF for remote sensing image classification and regression. Especially in the context of increasingly large geodatabases that challenge the application of traditional methods

    Evaluating the performance of a Random Forest Kernel for land cover classification

    Get PDF
    The production of land cover maps through satellite image classification is a frequent task in remote sensing. Random Forest (RF) and Support Vector Machine (SVM) are the two most well-known and recurrently used methods for this task. In this paper, we evaluate the pros and cons of using an RF-based kernel (RFK) in an SVM compared to using the conventional Radial Basis Function (RBF) kernel and standard RF classifier. A time series of seven multispectralWorldView-2 images acquired over Sukumba (Mali) and a single hyperspectral AVIRIS image acquired over Salinas Valley (CA, USA) are used to illustrate the analyses. For each study area, SVM-RFK, RF, and SVM-RBF were trained and tested under different conditions over ten subsets. The spectral features for Sukumba were extended by obtaining vegetation indices (VIs) and grey-level co-occurrence matrices (GLCMs), the Salinas dataset is used as benchmarking with its original number of features. In Sukumba, the overall accuracies (OAs) based on the spectral features only are of 81.34%, 81.08% and 82.08% for SVM-RFK, RF, and SVM-RBF. Adding VI and GLCM features results in OAs of 82.%, 80.82% and 77.96%. In Salinas, OAs are of 94.42%, 95.83% and 94.16%. These results show that SVM-RFK yields slightly higher OAs than RF in high dimensional and noisy experiments, and it provides competitive results in the rest of the experiments. They also show that SVM-RFK generates highly competitive results when compared to SVM-RBF while substantially reducing the time and computational cost associated with parametrizing the kernel. Moreover, SVM-RFK outperforms SVM-RBF in high dimensional and noisy problems. RF was also used to select the most important features for the extended dataset of Sukumba; the SVM-RFK derived from these features improved the OA of the previous SVM-RFK by 2%. Thus, the proposed SVM-RFK classifier is as at least as good as RF and SVM-RBF and can achieve considerable improvements when applied to high dimensional data and when combined with RF-based feature selection methods
    corecore