429,198 research outputs found

    Benchmark of machine learning methods for classification of a Sentinel-2 image

    Get PDF
    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performanc

    Open source R for applying machine learning to RPAS remote sensing images

    Get PDF
    The increase in the number of remote sensing platforms, ranging from satellites to close-range Remotely Piloted Aircraft System (RPAS), is leading to a growing demand for new image processing and classification tools. This article presents a comparison of the Random Forest (RF) and Support Vector Machine (SVM) machine-learning algorithms for extracting land-use classes in RPAS-derived orthomosaic using open source R packages. The camera used in this work captures the reflectance of the Red, Blue, Green and Near Infrared channels of a target. The full dataset is therefore a 4-channel raster image. The classification performance of the two methods is tested at varying sizes of training sets. The SVM and RF are evaluated using Kappa index, classification accuracy and classification error as accuracy metrics. The training sets are randomly obtained as subset of 2 to 20% of the total number of raster cells, with stratified sampling according to the land-use classes. Ten runs are done for each training set to calculate the variance in results. The control dataset consists of an independent classification obtained by photointerpretation. The validation is carried out(i) using the K-Fold cross validation, (ii) using the pixels from the validation test set, and (iii) using the pixels from the full test set. Validation with K-fold and with the validation dataset show SVM give better results, but RF prove to be more performing when training size is larger. Classification error and classification accuracy follow the trend of Kappa index

    Statistical models for cores decomposition of an undirected random graph

    Full text link
    The kk-core decomposition is a widely studied summary statistic that describes a graph's global connectivity structure. In this paper, we move beyond using kk-core decomposition as a tool to summarize a graph and propose using kk-core decomposition as a tool to model random graphs. We propose using the shell distribution vector, a way of summarizing the decomposition, as a sufficient statistic for a family of exponential random graph models. We study the properties and behavior of the model family, implement a Markov chain Monte Carlo algorithm for simulating graphs from the model, implement a direct sampler from the set of graphs with a given shell distribution, and explore the sampling distributions of some of the commonly used complementary statistics as good candidates for heuristic model fitting. These algorithms provide first fundamental steps necessary for solving the following problems: parameter estimation in this ERGM, extending the model to its Bayesian relative, and developing a rigorous methodology for testing goodness of fit of the model and model selection. The methods are applied to a synthetic network as well as the well-known Sampson monks dataset.Comment: Subsection 3.1 is new: `Sample space restriction and degeneracy of real-world networks'. Several clarifying comments have been added. Discussion now mentions 2 additional specific open problems. Bibliography updated. 25 pages (including appendix), ~10 figure

    Covariance matrix estimation with heterogeneous samples

    Get PDF
    We consider the problem of estimating the covariance matrix Mp of an observation vector, using heterogeneous training samples, i.e., samples whose covariance matrices are not exactly Mp. More precisely, we assume that the training samples can be clustered into K groups, each one containing Lk, snapshots sharing the same covariance matrix Mk. Furthermore, a Bayesian approach is proposed in which the matrices Mk. are assumed to be random with some prior distribution. We consider two different assumptions for Mp. In a fully Bayesian framework, Mp is assumed to be random with a given prior distribution. Under this assumption, we derive the minimum mean-square error (MMSE) estimator of Mp which is implemented using a Gibbs-sampling strategy. Moreover, a simpler scheme based on a weighted sample covariance matrix (SCM) is also considered. The weights minimizing the mean square error (MSE) of the estimated covariance matrix are derived. Furthermore, we consider estimators based on colored or diagonal loading of the weighted SCM, and we determine theoretically the optimal level of loading. Finally, in order to relax the a priori assumptions about the covariance matrix Mp, the second part of the paper assumes that this matrix is deterministic and derives its maximum-likelihood estimator. Numerical simulations are presented to illustrate the performance of the different estimation schemes

    Model-adapted Fourier sampling for generative compressed sensing

    Full text link
    We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that O(kdn∄α∄∞2)\textit{O}(kdn\| \boldsymbol{\alpha}\|_{\infty}^{2}) uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network G:Rk→RnG:\mathbb{R}^k \to \mathbb{R}^n of depth dd, where each component of the so-called local coherence vector α\boldsymbol{\alpha} quantifies the alignment of a corresponding Fourier vector with the range of GG. We construct a model-adapted sampling strategy with an improved sample complexity of O(kd∄α∄22)\textit{O}(kd\| \boldsymbol{\alpha}\|_{2}^{2}) measurements. This is enabled by: (1) new theoretical recovery guarantees that we develop for nonuniformly random sampling distributions and then (2) optimizing the sampling distribution to minimize the number of measurements needed for these guarantees. This development offers a sample complexity applicable to natural signal classes, which are often almost maximally coherent with low Fourier frequencies. Finally, we consider a surrogate sampling scheme, and validate its performance in recovery experiments using the CelebA dataset.Comment: 12 pages, 4 figures. Submitted to the NeurIPS 2023 Workshop on Deep Learning and Inverse Problems. This revision features additional attribution of work, aknowledgmenents, and a correction in definition 1.

    Dataset Splitting Techniques Comparison For Face Classification on CCTV Images

    Get PDF
    The performance of classification models in machine learning algorithms is influenced by many factors, one of which is dataset splitting method. To avoid overfitting, it is important to apply a suitable dataset splitting strategy. This study presents comparison of four dataset splitting techniques, namely Random Sub-sampling Validation (RSV), k-Fold Cross Validation (k-FCV), Bootstrap Validation (BV) and Moralis Lima Martin Validation (MLMV). This comparison is done in face classification on CCTV images using Convolutional Neural Network (CNN) algorithm and Support Vector Machine (SVM) algorithm. This study is also applied in two image datasets. The results of the comparison are reviewed by using model accuracy in training set, validation set and test set, also bias and variance of the model. The experiment shows that k-FCV technique has more stable performance and provide high accuracy on training set as well as good generalizations on validation set and test set. Meanwhile, data splitting using MLMV technique has lower performance than the other three techniques since it yields lower accuracy. This technique also shows higher bias and variance values and it builds overfitting models, especially when it is applied on validation set

    Density Elicitation with applications in Probabilistic Loops

    Full text link
    Probabilistic loops can be employed to implement and to model different processes ranging from software to cyber-physical systems. One main challenge is how to automatically estimate the distribution of the underlying continuous random variables symbolically and without sampling. We develop an approach, which we call K-series estimation, to approximate statically the joint and marginal distributions of a vector of random variables updated in a probabilistic non-nested loop with polynomial and non-polynomial assignments. Our approach is a general estimation method for an unknown probability density function with bounded support. It naturally complements algorithms for automatic derivation of moments in probabilistic loops such as~\cite{BartocciKS19,Moosbruggeretal2022}. Its only requirement is a finite number of moments of the unknown density. We show that Gram-Charlier (GC) series, a widely used estimation method, is a special case of K-series when the normal probability density function is used as reference distribution. We provide also a formulation suitable for estimating both univariate and multivariate distributions. We demonstrate the feasibility of our approach using multiple examples from the literature.Comment: 34 page
