429,198 research outputs found
Benchmark of machine learning methods for classification of a Sentinel-2 image
Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of
remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue
since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and
orientations.
In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and
classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear
discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered
perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an
independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution
images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few
samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree
plantations (v) grasslands.
Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the
training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five
accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of
data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from
validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from
0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its
ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable
performanc
Open source R for applying machine learning to RPAS remote sensing images
The increase in the number of remote sensing platforms, ranging from satellites to close-range Remotely Piloted Aircraft System (RPAS), is leading to a growing demand for new image processing and classification tools. This article presents a comparison of the Random Forest (RF) and Support Vector Machine (SVM) machine-learning algorithms for extracting land-use classes in RPAS-derived orthomosaic using open source R packages.
The camera used in this work captures the reflectance of the Red, Blue, Green and Near Infrared channels of a target. The full dataset is therefore a 4-channel raster image. The classification performance of the two methods is tested at varying sizes of training sets. The SVM and RF are evaluated using Kappa index, classification accuracy and classification error as accuracy metrics. The training sets are randomly obtained as subset of 2 to 20% of the total number of raster cells, with stratified sampling according to the land-use classes. Ten runs are done for each training set to calculate the variance in results. The control dataset consists of an independent classification obtained by photointerpretation. The validation is carried out(i) using the K-Fold cross validation, (ii) using the pixels from the validation test set, and (iii) using the pixels from the full test set.
Validation with K-fold and with the validation dataset show SVM give better results, but RF prove to be more performing when training size is larger. Classification error and classification accuracy follow the trend of Kappa index
Statistical models for cores decomposition of an undirected random graph
The -core decomposition is a widely studied summary statistic that
describes a graph's global connectivity structure. In this paper, we move
beyond using -core decomposition as a tool to summarize a graph and propose
using -core decomposition as a tool to model random graphs. We propose using
the shell distribution vector, a way of summarizing the decomposition, as a
sufficient statistic for a family of exponential random graph models. We study
the properties and behavior of the model family, implement a Markov chain Monte
Carlo algorithm for simulating graphs from the model, implement a direct
sampler from the set of graphs with a given shell distribution, and explore the
sampling distributions of some of the commonly used complementary statistics as
good candidates for heuristic model fitting. These algorithms provide first
fundamental steps necessary for solving the following problems: parameter
estimation in this ERGM, extending the model to its Bayesian relative, and
developing a rigorous methodology for testing goodness of fit of the model and
model selection. The methods are applied to a synthetic network as well as the
well-known Sampson monks dataset.Comment: Subsection 3.1 is new: `Sample space restriction and degeneracy of
real-world networks'. Several clarifying comments have been added. Discussion
now mentions 2 additional specific open problems. Bibliography updated. 25
pages (including appendix), ~10 figure
Covariance matrix estimation with heterogeneous samples
We consider the problem of estimating the covariance matrix Mp of an observation vector, using heterogeneous training samples, i.e., samples whose covariance matrices are not exactly Mp. More precisely, we assume that the training samples can be clustered into K groups, each one containing Lk, snapshots sharing the same covariance matrix Mk. Furthermore, a Bayesian approach is proposed in which the matrices Mk. are assumed to be random with some prior distribution. We consider two different assumptions for Mp. In a fully Bayesian framework, Mp is assumed to be random with a given prior distribution. Under this assumption, we derive the minimum mean-square error (MMSE) estimator of Mp which is implemented using a Gibbs-sampling strategy. Moreover, a simpler scheme based on a weighted sample covariance matrix (SCM) is also considered. The weights minimizing the mean square error (MSE) of the estimated covariance matrix are derived. Furthermore, we consider estimators based on colored or diagonal loading of the weighted SCM, and we determine theoretically the optimal level of loading. Finally, in order to relax the a priori assumptions about the covariance matrix Mp, the second part of the paper assumes that this matrix is deterministic and derives its maximum-likelihood estimator. Numerical simulations are presented to illustrate the performance of the different estimation schemes
Model-adapted Fourier sampling for generative compressed sensing
We study generative compressed sensing when the measurement matrix is
randomly subsampled from a unitary matrix (with the DFT as an important special
case). It was recently shown that uniformly random Fourier measurements are
sufficient to recover signals in the range of a neural network of depth , where each component of the so-called local
coherence vector quantifies the alignment of a
corresponding Fourier vector with the range of . We construct a
model-adapted sampling strategy with an improved sample complexity of
measurements. This is enabled
by: (1) new theoretical recovery guarantees that we develop for nonuniformly
random sampling distributions and then (2) optimizing the sampling distribution
to minimize the number of measurements needed for these guarantees. This
development offers a sample complexity applicable to natural signal classes,
which are often almost maximally coherent with low Fourier frequencies.
Finally, we consider a surrogate sampling scheme, and validate its performance
in recovery experiments using the CelebA dataset.Comment: 12 pages, 4 figures. Submitted to the NeurIPS 2023 Workshop on Deep
Learning and Inverse Problems. This revision features additional attribution
of work, aknowledgmenents, and a correction in definition 1.
Dataset Splitting Techniques Comparison For Face Classification on CCTV Images
The performance of classification models in machine learning algorithms is influenced by many factors, one of which is dataset splitting method. To avoid overfitting, it is important to apply a suitable dataset splitting strategy. This study presents comparison of four dataset splitting techniques, namely Random Sub-sampling Validation (RSV), k-Fold Cross Validation (k-FCV), Bootstrap Validation (BV) and Moralis Lima Martin Validation (MLMV). This comparison is done in face classification on CCTV images using Convolutional Neural Network (CNN) algorithm and Support Vector Machine (SVM) algorithm. This study is also applied in two image datasets. The results of the comparison are reviewed by using model accuracy in training set, validation set and test set, also bias and variance of the model. The experiment shows that k-FCV technique has more stable performance and provide high accuracy on training set as well as good generalizations on validation set and test set. Meanwhile, data splitting using MLMV technique has lower performance than the other three techniques since it yields lower accuracy. This technique also shows higher bias and variance values and it builds overfitting models, especially when it is applied on validation set
Density Elicitation with applications in Probabilistic Loops
Probabilistic loops can be employed to implement and to model different
processes ranging from software to cyber-physical systems. One main challenge
is how to automatically estimate the distribution of the underlying continuous
random variables symbolically and without sampling. We develop an approach,
which we call K-series estimation, to approximate statically the joint and
marginal distributions of a vector of random variables updated in a
probabilistic non-nested loop with polynomial and non-polynomial assignments.
Our approach is a general estimation method for an unknown probability density
function with bounded support. It naturally complements algorithms for
automatic derivation of moments in probabilistic loops such
as~\cite{BartocciKS19,Moosbruggeretal2022}. Its only requirement is a finite
number of moments of the unknown density. We show that Gram-Charlier (GC)
series, a widely used estimation method, is a special case of K-series when the
normal probability density function is used as reference distribution. We
provide also a formulation suitable for estimating both univariate and
multivariate distributions. We demonstrate the feasibility of our approach
using multiple examples from the literature.Comment: 34 page
- âŠ