19,357 research outputs found

    Robust EM algorithm for model-based curve clustering

    Full text link
    Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful approaches in cluster analysis. The mixture density estimation is generally performed by maximizing the observed-data log-likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the EM algorithm initialization is crucial. In addition, the standard EM algorithm requires the number of clusters to be known a priori. Some solutions have been provided in [31, 12] for model-based clustering with Gaussian mixture models for multivariate data. In this paper we focus on model-based curve clustering approaches, when the data are curves rather than vectorial data, based on regression mixtures. We propose a new robust EM algorithm for clustering curves. We extend the model-based clustering approach presented in [31] for Gaussian mixture models, to the case of curve clustering by regression mixtures, including polynomial regression mixtures as well as spline or B-spline regressions mixtures. Our approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a two-fold scheme. This is achieved by optimizing a penalized log-likelihood criterion. A simulation study confirms the potential benefit of the proposed algorithm in terms of robustness regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), 2013, Dallas, TX, US

    Evaluating the Differences of Gridding Techniques for Digital Elevation Models Generation and Their Influence on the Modeling of Stony Debris Flows Routing: A Case Study From Rovina di Cancia Basin (North-Eastern Italian Alps)

    Get PDF
    Debris \ufb02ows are among the most hazardous phenomena in mountain areas. To cope with debris \ufb02ow hazard, it is common to delineate the risk-prone areas through routing models. The most important input to debris \ufb02ow routing models are the topographic data, usually in the form of Digital Elevation Models (DEMs). The quality of DEMs depends on the accuracy, density, and spatial distribution of the sampled points; on the characteristics of the surface; and on the applied gridding methodology. Therefore, the choice of the interpolation method affects the realistic representation of the channel and fan morphology, and thus potentially the debris \ufb02ow routing modeling outcomes. In this paper, we initially investigate the performance of common interpolation methods (i.e., linear triangulation, natural neighbor, nearest neighbor, Inverse Distance to a Power, ANUDEM, Radial Basis Functions, and ordinary kriging) in building DEMs with the complex topography of a debris \ufb02ow channel located in the Venetian Dolomites (North-eastern Italian Alps), by using small footprint full- waveform Light Detection And Ranging (LiDAR) data. The investigation is carried out through a combination of statistical analysis of vertical accuracy, algorithm robustness, and spatial clustering of vertical errors, and multi-criteria shape reliability assessment. After that, we examine the in\ufb02uence of the tested interpolation algorithms on the performance of a Geographic Information System (GIS)-based cell model for simulating stony debris \ufb02ows routing. In detail, we investigate both the correlation between the DEMs heights uncertainty resulting from the gridding procedure and that on the corresponding simulated erosion/deposition depths, both the effect of interpolation algorithms on simulated areas, erosion and deposition volumes, solid-liquid discharges, and channel morphology after the event. The comparison among the tested interpolation methods highlights that the ANUDEM and ordinary kriging algorithms are not suitable for building DEMs with complex topography. Conversely, the linear triangulation, the natural neighbor algorithm, and the thin-plate spline plus tension and completely regularized spline functions ensure the best trade-off among accuracy and shape reliability. Anyway, the evaluation of the effects of gridding techniques on debris \ufb02ow routing modeling reveals that the choice of the interpolation algorithm does not signi\ufb01cantly affect the model outcomes

    Simultaneous inference for misaligned multivariate functional data

    Full text link
    We consider inference for misaligned multivariate functional data that represents the same underlying curve, but where the functional samples have systematic differences in shape. In this paper we introduce a new class of generally applicable models where warping effects are modeled through nonlinear transformation of latent Gaussian variables and systematic shape differences are modeled by Gaussian processes. To model cross-covariance between sample coordinates we introduce a class of low-dimensional cross-covariance structures suitable for modeling multivariate functional data. We present a method for doing maximum-likelihood estimation in the models and apply the method to three data sets. The first data set is from a motion tracking system where the spatial positions of a large number of body-markers are tracked in three-dimensions over time. The second data set consists of height and weight measurements for Danish boys. The third data set consists of three-dimensional spatial hand paths from a controlled obstacle-avoidance experiment. We use the developed method to estimate the cross-covariance structure, and use a classification setup to demonstrate that the method outperforms state-of-the-art methods for handling misaligned curve data.Comment: 44 pages in total including tables and figures. Additional 9 pages of supplementary material and reference
    corecore