23 research outputs found

    Improved variable reduction in partial least squares modelling by Global-Minimum Error Uninformative-Variable Elimination

    Get PDF
    Contains fulltext : 187506.pdf (publisher's version ) (Open Access

    SpaRef: a clustering algorithm for multispectral images

    No full text
    Multispectral images such as multispectral chemical images or multispectral satellite images provide detailed data with information in both the spatial and spectral domains. Many segmentation methods for multispectral images are based on a per-pixel classification, which uses only spectral information and ignores spatial information. A clustering algorithm based on both spectral and spatial information would produce better results

    Model-Based Clustering for Image Segmentation and Large Datasets via Sampling

    No full text
    The rapid increase in the size of data sets makes clustering all the more important to capture and summarize the information, at the same time making clustering more di#cult to accomplish. If model-based clustering is applied directly to a large data set, it can be too slow for practical application. A simple and common approach is to first cluster a random sample of moderate size, and then use the clustering model found in this way to classify the remainder of the objects. We show that, in its simplest form, this method may lead to unstable results. Our experiments suggest that a stable method with better performance can be obtained with two straightforward modifications to the simple sampling method: several tentative models are identified from the sample instead of just one, and several EM steps are used rather than just one E step to classify the full data set. We find that there are significant gains from increasing the size of the sample up to about 2,000, but not from further increases. These conclusions are based on the application of several alternative strategies to the segmentation of three di#erent multispectral images, and to several simulated data sets

    Evaluation and comparison of unsupervised methods for the extraction of spatial patterns from mass spectrometry imaging data (MSI)

    Get PDF
    For the extraction of spatially important regions from mass spectrometry imaging (MSI) data, different clustering methods have been proposed. These clustering methods are based on certain assumptions and use different criteria to assign pixels into different classes. For high-dimensional MSI data, the curse of dimensionality also limits the performance of clustering methods which are usually overcome by pre-processing the data using dimension reduction techniques. In summary, the extraction of spatial patterns from MSI data can be done using different unsupervised methods, but the robust evaluation of clustering results is what is still missing. In this study, we have performed multiple simulations on synthetic and real MSI data to validate the performance of unsupervised methods. The synthetic data were simulated mimicking important spatial and statistical properties of real MSI data. Our simulation results confirmed that K-means clustering with correlation distance and Gaussian Mixture Modeling clustering methods give optimal performance in most of the scenarios. The clustering methods give efficient results together with dimension reduction techniques. From all the dimension techniques considered here, the best results were obtained with the minimum noise fraction (MNF) transform. The results were confirmed on both synthetic and real MSI data. However, for successful implementation of MNF transform the MSI data requires to be of limited dimensions

    Automatically optimizing dynamic synchronization of individual industrial process variables for statistical modelling

    Get PDF
    Statistical modelling of industrial production data can lead to improved understanding of the process to benefit process monitoring and control routines. The production data required for such models need however to be synchronized in time, a topic sparsely covered in literature. We propose a strategy for data-driven automated optimization of dynamic synchronization of industrial production data, that optimizes the synchronization per process variable and can be applied for on-line monitoring in real-time. The strategy is tested and validated for two relevant production facilities, each of which has multiple production lines or configurations. For all lines and configurations, models predicting the production quality from process variables improved in accuracy using the presented per-variable optimization strategy. Although the prediction accuracy for two models would still be insufficient for real-time monitoring and control, process operators and engineers may still obtain novel process understanding from applying the presented strategy on these models
    corecore