44 research outputs found
Unsupervised Learning via Mixtures of Skewed Distributions with Hypercube Contours
Mixture models whose components have skewed hypercube contours are developed
via a generalization of the multivariate shifted asymmetric Laplace density.
Specifically, we develop mixtures of multiple scaled shifted asymmetric Laplace
distributions. The component densities have two unique features: they include a
multivariate weight function, and the marginal distributions are also
asymmetric Laplace. We use these mixtures of multiple scaled shifted asymmetric
Laplace distributions for clustering applications, but they could equally well
be used in the supervised or semi-supervised paradigms. The
expectation-maximization algorithm is used for parameter estimation and the
Bayesian information criterion is used for model selection. Simulated and real
data sets are used to illustrate the approach and, in some cases, to visualize
the skewed hypercube structure of the components
Factor Analysis of Data Matrices: New Theoretical and Computational Aspects With Applications
The classical fitting problem in exploratory factor analysis (EFA) is to find estimates for the factor loadings matrix and the matrix of unique factor variances which give the best fit to the sample covariance or correlation matrix with respect to some goodness-of-fit criterion. Predicted factor scores can be obtained as a function of these estimates and the data. In this thesis, the EFA model is considered as a specific data matrix decomposition with fixed unknown matrix parameters. Fitting the EFA model directly to the data yields simultaneous solutions for both loadings and factor scores. Several new algorithms are introduced for the least squares and weighted least squares estimation of all EFA model unknowns. The numerical procedures are based on the singular value decomposition, facilitate the estimation of both common and unique factor scores, and work equally well when the number of variables exceeds the number of available observations.
Like EFA, noisy independent component analysis (ICA) is a technique for reduction of the data dimensionality in which the interrelationships among the observed variables are explained in terms of a much smaller number of latent factors. The key difference between EFA and noisy ICA is that in the latter model the common factors are assumed to be both independent and non-normal. In contrast to EFA, there is no rotational indeterminacy in noisy ICA. In this thesis, noisy ICA is viewed as a method of factor rotation in EFA. Starting from an initial EFA solution, an orthogonal rotation matrix is sought that minimizes the dependence between the common factors. The idea of rotating the scores towards independence is also employed in three-mode factor analysis to analyze data sets having a three-way structure.
The new theoretical and computational aspects contained in this thesis are illustrated by means of several examples with real and artificial data
K-Tensors: Clustering Positive Semi-Definite Matrices
This paper introduces a novel self-consistency clustering algorithm
(-Tensors) designed for {partitioning a distribution of}
positive-semidefinite matrices based on their eigenstructures. As positive
semi-definite matrices can be represented as ellipsoids in , , it is critical to maintain their structural information to perform
effective clustering. However, traditional clustering algorithms {applied to
matrices} often {involve vectorization of} the matrices, resulting in a loss of
essential structural information. To address this issue, we propose a distance
metric {for clustering} that is specifically based on the structural
information of positive semi-definite matrices. This distance metric enables
the clustering algorithm to consider the differences between positive
semi-definite matrices and their projections onto {a} common space spanned by
\thadJulyTen{orthonormal vectors defined from a set of} positive semi-definite
matrices. This innovative approach to clustering positive semi-definite
matrices has broad applications in several domains including financial and
biomedical research, such as analyzing functional connectivity data. By
maintaining the structural information of positive semi-definite matrices, our
proposed algorithm promises to cluster the positive semi-definite matrices in a
more meaningful way, thereby facilitating deeper insights into the underlying
data in various applications