2,313 research outputs found
EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis
Data clustering has received a lot of attention and numerous methods,
algorithms and software packages are available. Among these techniques,
parametric finite-mixture models play a central role due to their interesting
mathematical properties and to the existence of maximum-likelihood estimators
based on expectation-maximization (EM). In this paper we propose a new mixture
model that associates a weight with each observed point. We introduce the
weighted-data Gaussian mixture and we derive two EM algorithms. The first one
considers a fixed weight for each observation. The second one treats each
weight as a random variable following a gamma distribution. We propose a model
selection method based on a minimum message length criterion, provide a weight
initialization strategy, and validate the proposed algorithms by comparing them
with several state of the art parametric and non-parametric clustering
techniques. We also demonstrate the effectiveness and robustness of the
proposed clustering technique in the presence of heterogeneous data, namely
audio-visual scene analysis.Comment: 14 pages, 4 figures, 4 table
Tensor Regression
Regression analysis is a key area of interest in the field of data analysis
and machine learning which is devoted to exploring the dependencies between
variables, often using vectors. The emergence of high dimensional data in
technologies such as neuroimaging, computer vision, climatology and social
networks, has brought challenges to traditional data representation methods.
Tensors, as high dimensional extensions of vectors, are considered as natural
representations of high dimensional data. In this book, the authors provide a
systematic study and analysis of tensor-based regression models and their
applications in recent years. It groups and illustrates the existing
tensor-based regression methods and covers the basics, core ideas, and
theoretical characteristics of most tensor-based regression methods. In
addition, readers can learn how to use existing tensor-based regression methods
to solve specific regression tasks with multiway data, what datasets can be
selected, and what software packages are available to start related work as
soon as possible. Tensor Regression is the first thorough overview of the
fundamentals, motivations, popular algorithms, strategies for efficient
implementation, related applications, available datasets, and software
resources for tensor-based regression analysis. It is essential reading for all
students, researchers and practitioners of working on high dimensional data.Comment: 187 pages, 32 figures, 10 table
Covariate dimension reduction for survival data via the Gaussian process latent variable model
The analysis of high dimensional survival data is challenging, primarily due
to the problem of overfitting which occurs when spurious relationships are
inferred from data that subsequently fail to exist in test data. Here we
propose a novel method of extracting a low dimensional representation of
covariates in survival data by combining the popular Gaussian Process Latent
Variable Model (GPLVM) with a Weibull Proportional Hazards Model (WPHM). The
combined model offers a flexible non-linear probabilistic method of detecting
and extracting any intrinsic low dimensional structure from high dimensional
data. By reducing the covariate dimension we aim to diminish the risk of
overfitting and increase the robustness and accuracy with which we infer
relationships between covariates and survival outcomes. In addition, we can
simultaneously combine information from multiple data sources by expressing
multiple datasets in terms of the same low dimensional space. We present
results from several simulation studies that illustrate a reduction in
overfitting and an increase in predictive performance, as well as successful
detection of intrinsic dimensionality. We provide evidence that it is
advantageous to combine dimensionality reduction with survival outcomes rather
than performing unsupervised dimensionality reduction on its own. Finally, we
use our model to analyse experimental gene expression data and detect and
extract a low dimensional representation that allows us to distinguish high and
low risk groups with superior accuracy compared to doing regression on the
original high dimensional data
- …