17,172 research outputs found
Constrained Optimization for a Subset of the Gaussian Parsimonious Clustering Models
The expectation-maximization (EM) algorithm is an iterative method for
finding maximum likelihood estimates when data are incomplete or are treated as
being incomplete. The EM algorithm and its variants are commonly used for
parameter estimation in applications of mixture models for clustering and
classification. This despite the fact that even the Gaussian mixture model
likelihood surface contains many local maxima and is singularity riddled.
Previous work has focused on circumventing this problem by constraining the
smallest eigenvalue of the component covariance matrices. In this paper, we
consider constraining the smallest eigenvalue, the largest eigenvalue, and both
the smallest and largest within the family setting. Specifically, a subset of
the GPCM family is considered for model-based clustering, where we use a
re-parameterized version of the famous eigenvalue decomposition of the
component covariance matrices. Our approach is illustrated using various
experiments with simulated and real data
A robust approach to model-based classification based on trimming and constraints
In a standard classification framework a set of trustworthy learning data are
employed to build a decision rule, with the final aim of classifying unlabelled
units belonging to the test set. Therefore, unreliable labelled observations,
namely outliers and data with incorrect labels, can strongly undermine the
classifier performance, especially if the training size is small. The present
work introduces a robust modification to the Model-Based Classification
framework, employing impartial trimming and constraints on the ratio between
the maximum and the minimum eigenvalue of the group scatter matrices. The
proposed method effectively handles noise presence in both response and
exploratory variables, providing reliable classification even when dealing with
contaminated datasets. A robust information criterion is proposed for model
selection. Experiments on real and simulated data, artificially adulterated,
are provided to underline the benefits of the proposed method
Arriving on time: estimating travel time distributions on large-scale road networks
Most optimal routing problems focus on minimizing travel time or distance
traveled. Oftentimes, a more useful objective is to maximize the probability of
on-time arrival, which requires statistical distributions of travel times,
rather than just mean values. We propose a method to estimate travel time
distributions on large-scale road networks, using probe vehicle data collected
from GPS. We present a framework that works with large input of data, and
scales linearly with the size of the network. Leveraging the planar topology of
the graph, the method computes efficiently the time correlations between
neighboring streets. First, raw probe vehicle traces are compressed into pairs
of travel times and number of stops for each traversed road segment using a
`stop-and-go' algorithm developed for this work. The compressed data is then
used as input for training a path travel time model, which couples a Markov
model along with a Gaussian Markov random field. Finally, scalable inference
algorithms are developed for obtaining path travel time distributions from the
composite MM-GMRF model. We illustrate the accuracy and scalability of our
model on a 505,000 road link network spanning the San Francisco Bay Area
- …