2 research outputs found

    High-dimensional multimodal distribution embedding

    No full text
    High-dimensional data is emerging in more and more varied domains, but its analysis has revealed to be difficult due to the curse of dimensionality. Dimension reduction emerged as a powerful tool in overcoming problems related to high-dimensionality, still the curse of dimensionality continues to impact many of the existing methods. The current paper concentrates on low-dimensional distance-based embeddings for high-dimensional multimodal distributions, i.e. clustered data. Pair wise distances are particularly influenced by high-dimensionality. Their analysis is at the basis of the embedding method presented here and called HDME. To avoid the problems of high-dimensionality, HDME performs a distance transformation based on interpoint relationships. The positive influence of the transformation in preserving and emphasizing clusters is first demonstrated using label information. The distance transformation is driven by the estimation of the neighbourhood information. The transformed distances are embedded in a low-dimensional space using a classical embedding method. Experiments on real-world data show that distance transformations can be effectively used in conjunction with distance-based embedding methods to obtain representation spaces that well discriminate clusters

    Dimension reduction for clustered high-dimensional data

    No full text
    Recent times have witnessed the transition towards a significantly larger scale both in the number of samples and the number of attributes characterising data collections. It is this latter aspect, the dimensionality of the data, that is at the center of the present thesis. We first analyse the evolution of the distance contrast and emphasise its dual character: absolute vs. relative. The second focus is on clustered structures, still in the context of high-dimensional data. Our purpose is to find low-dimensional embeddings with strong discriminative power. In this direction, we propose two methods, the High-Dimensional Multimodal Distribution Embedding - a distance-based embedding method that exploits distance distributions in high dimensions - and the Cluster Space - that projects points in the space of the clusters using the probabilities obtained from a Gaussian mixture model
    corecore