396 research outputs found
DYNAMIC MIXTURES OF FACTOR ANALYZERS TO CHARACTERIZE MULTIVARIATE AIR POLLUTANT EXPOSURES
The assessment of pollution exposure is based on the analysis
of multivariate time series that include the concentrations of several
pollutants as well as the measurements of multiple atmospheric variables.
It typically requires methods of dimensionality reduction that
are capable to identify potentially dangerous combinations of pollutants
and, simultaneously, to segment exposure periods according
to air quality conditions. When the data are high-dimensional, however,
efficient methods of dimensionality reduction are challenging
because of the formidable structure of cross-correlations that arise
from the dynamic interaction between weather conditions and natural/anthropogenic
pollution sources. In order to assess pollution exposure
in an urban area while taking the above mentioned difficulties
into account, we develop a class of parsimonious hidden Markov
models. In a multivariate time-series setting, this approach allows to
simultaneously perform temporal segmentation and dimensionality
reduction. We specifically approximate the distribution of multiple
pollutant concentrations by mixtures of factor analysis models, whose
parameters evolve according to a latent Markov chain. Covariates are
included as predictors of the chain transition probabilities. Parameter
constraints on the factorial component of the model are exploited
to tune the flexibility of dimensionality reduction. In order to estimate
the model parameters efficiently, we propose a novel three-step
Alternating Expected Conditional Maximization (AECM) algorithm,
which is also assessed in a simulation study. In the case study, the
proposed methods were capable (1) to describe the exposure to pollution
in terms of a few latent regimes, (2) to associate these regimes
with specific combinations of pollutant concentration levels as well
as distinct correlation structures between concentrations, and (3) to
capture the influence of weather conditions on transitions between
regime
Robust, fuzzy, and parsimonious clustering based on mixtures of Factor Analyzers
A clustering algorithm that combines the advantages of fuzzy clustering and robust statistical estimators is presented. It is based on mixtures of Factor Analyzers, endowed by the joint usage of trimming and the constrained estimation of scatter matrices, in a modified maximum likelihood approach. The algorithm generates a set of membership values, that are used to fuzzy partition the data set and to contribute to the robust estimates of the mixture parameters. The adoption of clusters modeled by Gaussian Factor Analysis allows for dimension reduction and for discovering local linear structures in the data. The new methodology has been shown to be resistant to different types of contamination, by applying it on artificial data. A brief discussion on the tuning parameters, such as the trimming level, the fuzzifier parameter, the number of clusters and the value of the scatter matrices constraint, has been developed, also with the help of some heuristic tools for their choice. Finally, a real data set has been analyzed, to show how intermediate membership values are estimated for observations lying at cluster overlap, while cluster cores are composed by observations that are assigned to a cluster in a crisp way.Ministerio de Economía y Competitividad grant MTM2017-86061-C2-1-P, y Consejería de Educación de la Junta de Castilla y León and FEDER grantVA005P17 y VA002G1
Mixtures of Common Skew-t Factor Analyzers
A mixture of common skew-t factor analyzers model is introduced for
model-based clustering of high-dimensional data. By assuming common component
factor loadings, this model allows clustering to be performed in the presence
of a large number of mixture components or when the number of dimensions is too
large to be well-modelled by the mixtures of factor analyzers model or a
variant thereof. Furthermore, assuming that the component densities follow a
skew-t distribution allows robust clustering of skewed data. The alternating
expectation-conditional maximization algorithm is employed for parameter
estimation. We demonstrate excellent clustering performance when our model is
applied to real and simulated data.This paper marks the first time that skewed
common factors have been used
- …