19,357 research outputs found
Robust EM algorithm for model-based curve clustering
Model-based clustering approaches concern the paradigm of exploratory data
analysis relying on the finite mixture model to automatically find a latent
structure governing observed data. They are one of the most popular and
successful approaches in cluster analysis. The mixture density estimation is
generally performed by maximizing the observed-data log-likelihood by using the
expectation-maximization (EM) algorithm. However, it is well-known that the EM
algorithm initialization is crucial. In addition, the standard EM algorithm
requires the number of clusters to be known a priori. Some solutions have been
provided in [31, 12] for model-based clustering with Gaussian mixture models
for multivariate data. In this paper we focus on model-based curve clustering
approaches, when the data are curves rather than vectorial data, based on
regression mixtures. We propose a new robust EM algorithm for clustering
curves. We extend the model-based clustering approach presented in [31] for
Gaussian mixture models, to the case of curve clustering by regression
mixtures, including polynomial regression mixtures as well as spline or
B-spline regressions mixtures. Our approach both handles the problem of
initialization and the one of choosing the optimal number of clusters as the EM
learning proceeds, rather than in a two-fold scheme. This is achieved by
optimizing a penalized log-likelihood criterion. A simulation study confirms
the potential benefit of the proposed algorithm in terms of robustness
regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural
Networks (IJCNN), 2013, Dallas, TX, US
Evaluating the Differences of Gridding Techniques for Digital Elevation Models Generation and Their Influence on the Modeling of Stony Debris Flows Routing: A Case Study From Rovina di Cancia Basin (North-Eastern Italian Alps)
Debris \ufb02ows are among the most hazardous phenomena in mountain areas. To cope
with debris \ufb02ow hazard, it is common to delineate the risk-prone areas through
routing models. The most important input to debris \ufb02ow routing models are the
topographic data, usually in the form of Digital Elevation Models (DEMs). The quality
of DEMs depends on the accuracy, density, and spatial distribution of the sampled
points; on the characteristics of the surface; and on the applied gridding methodology.
Therefore, the choice of the interpolation method affects the realistic representation
of the channel and fan morphology, and thus potentially the debris \ufb02ow routing
modeling outcomes. In this paper, we initially investigate the performance of common
interpolation methods (i.e., linear triangulation, natural neighbor, nearest neighbor,
Inverse Distance to a Power, ANUDEM, Radial Basis Functions, and ordinary kriging)
in building DEMs with the complex topography of a debris \ufb02ow channel located
in the Venetian Dolomites (North-eastern Italian Alps), by using small footprint full-
waveform Light Detection And Ranging (LiDAR) data. The investigation is carried
out through a combination of statistical analysis of vertical accuracy, algorithm
robustness, and spatial clustering of vertical errors, and multi-criteria shape reliability
assessment. After that, we examine the in\ufb02uence of the tested interpolation algorithms
on the performance of a Geographic Information System (GIS)-based cell model for
simulating stony debris \ufb02ows routing. In detail, we investigate both the correlation
between the DEMs heights uncertainty resulting from the gridding procedure and
that on the corresponding simulated erosion/deposition depths, both the effect of
interpolation algorithms on simulated areas, erosion and deposition volumes, solid-liquid
discharges, and channel morphology after the event. The comparison among the tested
interpolation methods highlights that the ANUDEM and ordinary kriging algorithms
are not suitable for building DEMs with complex topography. Conversely, the linear
triangulation, the natural neighbor algorithm, and the thin-plate spline plus tension and completely regularized spline functions ensure the best trade-off among accuracy
and shape reliability. Anyway, the evaluation of the effects of gridding techniques on
debris \ufb02ow routing modeling reveals that the choice of the interpolation algorithm does
not signi\ufb01cantly affect the model outcomes
Simultaneous inference for misaligned multivariate functional data
We consider inference for misaligned multivariate functional data that
represents the same underlying curve, but where the functional samples have
systematic differences in shape. In this paper we introduce a new class of
generally applicable models where warping effects are modeled through nonlinear
transformation of latent Gaussian variables and systematic shape differences
are modeled by Gaussian processes. To model cross-covariance between sample
coordinates we introduce a class of low-dimensional cross-covariance structures
suitable for modeling multivariate functional data. We present a method for
doing maximum-likelihood estimation in the models and apply the method to three
data sets. The first data set is from a motion tracking system where the
spatial positions of a large number of body-markers are tracked in
three-dimensions over time. The second data set consists of height and weight
measurements for Danish boys. The third data set consists of three-dimensional
spatial hand paths from a controlled obstacle-avoidance experiment. We use the
developed method to estimate the cross-covariance structure, and use a
classification setup to demonstrate that the method outperforms
state-of-the-art methods for handling misaligned curve data.Comment: 44 pages in total including tables and figures. Additional 9 pages of
supplementary material and reference
- …