1,427,785 research outputs found
Model-based Methods of Classification: Using the mclust Software in Chemometrics
Due to recent advances in methods and software for model-based clustering, and to the interpretability of the results, clustering procedures based on probability models are increasingly preferred over heuristic methods. The clustering process estimates a model for the data that allows for overlapping clusters, producing a probabilistic clustering that quantifies the uncertainty of observations belonging to components of the mixture. The resulting clustering model can also be used for some other important problems in multivariate analysis, including density estimation and discriminant analysis. Examples of the use of model-based clustering and classification techniques in chemometric studies include multivariate image analysis, magnetic resonance imaging, microarray image segmentation, statistical process control, and food authenticity. We review model-based clustering and related methods for density estimation and discriminant analysis, and show how the R package mclust can be applied in each instance.
A study and evaluation of image analysis techniques applied to remotely sensed data
An analysis of phenomena causing nonlinearities in the transformation from Landsat multispectral scanner coordinates to ground coordinates is presented. Experimental results comparing rms errors at ground control points indicated a slight improvement when a nonlinear (8-parameter) transformation was used instead of an affine (6-parameter) transformation. Using a preliminary ground truth map of a test site in Alabama covering the Mobile Bay area and six Landsat images of the same scene, several classification methods were assessed. A methodology was developed for automatic change detection using classification/cluster maps. A coding scheme was employed for generation of change depiction maps indicating specific types of changes. Inter- and intraseasonal data of the Mobile Bay test area were compared to illustrate the method. A beginning was made in the study of data compression by applying a Karhunen-Loeve transform technique to a small section of the test data set. The second part of the report provides a formal documentation of the several programs developed for the analysis and assessments presented
Sedimentological characterization of Antarctic moraines using UAVs and Structure-from-Motion photogrammetry
In glacial environments particle-size analysis of moraines provides insights into clast origin, transport history, depositional mechanism and processes of reworking. Traditional methods for grain-size classification are labour-intensive, physically intrusive and are limited to patch-scale (1m2) observation. We develop emerging, high-resolution ground- and unmanned aerial vehicle-based ‘Structure-from-Motion’ (UAV-SfM) photogrammetry to recover grain-size information across an moraine surface in the Heritage Range, Antarctica. SfM data products were benchmarked against equivalent datasets acquired using terrestrial laser scanning, and were found to be accurate to within 1.7 and 50mm for patch- and site-scale modelling, respectively. Grain-size distributions were obtained through digital grain classification, or ‘photo-sieving’, of patch-scale SfM orthoimagery. Photo-sieved distributions were accurate to <2mm compared to control distributions derived from dry sieving. A relationship between patch-scale median grain size and the standard deviation of local surface elevations was applied to a site-scale UAV-SfM model to facilitate upscaling and the production of a spatially continuous map of the median grain size across a 0.3 km2 area of moraine. This highly automated workflow for site scale sedimentological characterization eliminates much of the subjectivity associated with traditional methods and forms a sound basis for subsequent glaciological
process interpretation and analysis
Improved Classification of Alzheimer's Disease Data via Removal of Nuisance Variability
Diagnosis of Alzheimer's disease is based on the results of neuropsychological tests and available supporting biomarkers such as the results of imaging studies. The results of the tests and the values of biomarkers are dependent on the nuisance features, such as age and gender. In order to improve diagnostic power, the effects of the nuisance features have to be removed from the data. In this paper, four types of interactions between classification features and nuisance features were identified. Three methods were tested to remove these interactions from the classification data. In stratified analysis, a homogeneous subgroup was generated from a training set. Data correction method utilized linear regression model to remove the effects of nuisance features from data. The third method was a combination of these two methods. The methods were tested using all the baseline data from the Alzheimer's Disease Neuroimaging Initiative database in two classification studies: classifying control subjects from Alzheimer's disease patients and discriminating stable and progressive mild cognitive impairment subjects. The results show that both stratified analysis and data correction are able to statistically significantly improve the classification accuracy of several neuropsychological tests and imaging biomarkers. The improvements were especially large for the classification of stable and progressive mild cognitive impairment subjects, where the best improvements observed were 6% units. The data correction method gave better results for imaging biomarkers, whereas stratified analysis worked well with the neuropsychological tests. In conclusion, the study shows that the excess variability caused by nuisance features should be removed from the data to improve the classification accuracy, and therefore, the reliability of diagnosis making
Benchmark of machine learning methods for classification of a Sentinel-2 image
Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of
remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue
since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and
orientations.
In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and
classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear
discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered
perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an
independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution
images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few
samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree
plantations (v) grasslands.
Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the
training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five
accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of
data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from
validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from
0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its
ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable
performanc
A decision-theoretic approach for segmental classification
This paper is concerned with statistical methods for the segmental
classification of linear sequence data where the task is to segment and
classify the data according to an underlying hidden discrete state sequence.
Such analysis is commonplace in the empirical sciences including genomics,
finance and speech processing. In particular, we are interested in answering
the following question: given data and a statistical model of
the hidden states , what should we report as the prediction under
the posterior distribution ? That is, how should you make a
prediction of the underlying states? We demonstrate that traditional approaches
such as reporting the most probable state sequence or most probable set of
marginal predictions can give undesirable classification artefacts and offer
limited control over the properties of the prediction. We propose a decision
theoretic approach using a novel class of Markov loss functions and report
via the principle of minimum expected loss (maximum expected
utility). We demonstrate that the sequence of minimum expected loss under the
Markov loss function can be enumerated exactly using dynamic programming
methods and that it offers flexibility and performance improvements over
existing techniques. The result is generic and applicable to any probabilistic
model on a sequence, such as Hidden Markov models, change point or product
partition models.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS657 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …