1,427,785 research outputs found

    Model-based Methods of Classification: Using the mclust Software in Chemometrics

    Get PDF
    Due to recent advances in methods and software for model-based clustering, and to the interpretability of the results, clustering procedures based on probability models are increasingly preferred over heuristic methods. The clustering process estimates a model for the data that allows for overlapping clusters, producing a probabilistic clustering that quantifies the uncertainty of observations belonging to components of the mixture. The resulting clustering model can also be used for some other important problems in multivariate analysis, including density estimation and discriminant analysis. Examples of the use of model-based clustering and classification techniques in chemometric studies include multivariate image analysis, magnetic resonance imaging, microarray image segmentation, statistical process control, and food authenticity. We review model-based clustering and related methods for density estimation and discriminant analysis, and show how the R package mclust can be applied in each instance.

    A study and evaluation of image analysis techniques applied to remotely sensed data

    Get PDF
    An analysis of phenomena causing nonlinearities in the transformation from Landsat multispectral scanner coordinates to ground coordinates is presented. Experimental results comparing rms errors at ground control points indicated a slight improvement when a nonlinear (8-parameter) transformation was used instead of an affine (6-parameter) transformation. Using a preliminary ground truth map of a test site in Alabama covering the Mobile Bay area and six Landsat images of the same scene, several classification methods were assessed. A methodology was developed for automatic change detection using classification/cluster maps. A coding scheme was employed for generation of change depiction maps indicating specific types of changes. Inter- and intraseasonal data of the Mobile Bay test area were compared to illustrate the method. A beginning was made in the study of data compression by applying a Karhunen-Loeve transform technique to a small section of the test data set. The second part of the report provides a formal documentation of the several programs developed for the analysis and assessments presented

    Sedimentological characterization of Antarctic moraines using UAVs and Structure-from-Motion photogrammetry

    Get PDF
    In glacial environments particle-size analysis of moraines provides insights into clast origin, transport history, depositional mechanism and processes of reworking. Traditional methods for grain-size classification are labour-intensive, physically intrusive and are limited to patch-scale (1m2) observation. We develop emerging, high-resolution ground- and unmanned aerial vehicle-based ‘Structure-from-Motion’ (UAV-SfM) photogrammetry to recover grain-size information across an moraine surface in the Heritage Range, Antarctica. SfM data products were benchmarked against equivalent datasets acquired using terrestrial laser scanning, and were found to be accurate to within 1.7 and 50mm for patch- and site-scale modelling, respectively. Grain-size distributions were obtained through digital grain classification, or ‘photo-sieving’, of patch-scale SfM orthoimagery. Photo-sieved distributions were accurate to <2mm compared to control distributions derived from dry sieving. A relationship between patch-scale median grain size and the standard deviation of local surface elevations was applied to a site-scale UAV-SfM model to facilitate upscaling and the production of a spatially continuous map of the median grain size across a 0.3 km2 area of moraine. This highly automated workflow for site scale sedimentological characterization eliminates much of the subjectivity associated with traditional methods and forms a sound basis for subsequent glaciological process interpretation and analysis

    Improved Classification of Alzheimer's Disease Data via Removal of Nuisance Variability

    Get PDF
    Diagnosis of Alzheimer's disease is based on the results of neuropsychological tests and available supporting biomarkers such as the results of imaging studies. The results of the tests and the values of biomarkers are dependent on the nuisance features, such as age and gender. In order to improve diagnostic power, the effects of the nuisance features have to be removed from the data. In this paper, four types of interactions between classification features and nuisance features were identified. Three methods were tested to remove these interactions from the classification data. In stratified analysis, a homogeneous subgroup was generated from a training set. Data correction method utilized linear regression model to remove the effects of nuisance features from data. The third method was a combination of these two methods. The methods were tested using all the baseline data from the Alzheimer's Disease Neuroimaging Initiative database in two classification studies: classifying control subjects from Alzheimer's disease patients and discriminating stable and progressive mild cognitive impairment subjects. The results show that both stratified analysis and data correction are able to statistically significantly improve the classification accuracy of several neuropsychological tests and imaging biomarkers. The improvements were especially large for the classification of stable and progressive mild cognitive impairment subjects, where the best improvements observed were 6% units. The data correction method gave better results for imaging biomarkers, whereas stratified analysis worked well with the neuropsychological tests. In conclusion, the study shows that the excess variability caused by nuisance features should be removed from the data to improve the classification accuracy, and therefore, the reliability of diagnosis making

    Benchmark of machine learning methods for classification of a Sentinel-2 image

    Get PDF
    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performanc

    A decision-theoretic approach for segmental classification

    Full text link
    This paper is concerned with statistical methods for the segmental classification of linear sequence data where the task is to segment and classify the data according to an underlying hidden discrete state sequence. Such analysis is commonplace in the empirical sciences including genomics, finance and speech processing. In particular, we are interested in answering the following question: given data yy and a statistical model π(x,y)\pi(x,y) of the hidden states xx, what should we report as the prediction x^\hat{x} under the posterior distribution π(xy)\pi (x|y)? That is, how should you make a prediction of the underlying states? We demonstrate that traditional approaches such as reporting the most probable state sequence or most probable set of marginal predictions can give undesirable classification artefacts and offer limited control over the properties of the prediction. We propose a decision theoretic approach using a novel class of Markov loss functions and report x^\hat{x} via the principle of minimum expected loss (maximum expected utility). We demonstrate that the sequence of minimum expected loss under the Markov loss function can be enumerated exactly using dynamic programming methods and that it offers flexibility and performance improvements over existing techniques. The result is generic and applicable to any probabilistic model on a sequence, such as Hidden Markov models, change point or product partition models.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS657 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore