10,061 research outputs found

    Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

    Full text link
    In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized, information-theoretic criterion that measures the change in costs associated with changes in information. Optimizing the value of information yields a deterministic annealing style of clustering with many benefits. For instance, investigators avoid needing to a priori specify the number of clusters, as the partitions naturally undergo phase changes, during the annealing process, whereby the number of clusters changes in a data-driven fashion. The global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP

    A Self-Supervised Approach for Cluster Assessment of High-Dimensional Data

    Full text link
    Estimating the number of clusters and underlying cluster structure in a dataset is a crucial task. Real-world data are often unlabeled, complex and high-dimensional, which makes it difficult for traditional clustering algorithms to perform well. In recent years, a matrix reordering based algorithm, called "visual assessment of tendency" (VAT), and its variants have attracted many researchers from various domains to estimate the number of clusters and inherent cluster structure present in the data. However, these algorithms fail when applied to high-dimensional data due to the curse of dimensionality, as they rely heavily on the notions of closeness and farness between data points. To address this issue, we propose a deep-learning based framework for cluster structure assessment in complex, image datasets. First, our framework generates representative embeddings for complex data using a self-supervised deep neural network, and then, these low-dimensional embeddings are fed to VAT/iVAT algorithms to estimate the underlying cluster structure. In this process, we ensured not to use any prior knowledge for the number of clusters (i.e k). We present our results on four real-life image datasets, and our findings indicate that our framework outperforms state-of-the-art VAT/iVAT algorithms in terms of clustering accuracy and normalized mutual information (NMI).Comment: Submitted to IEEE SMC 202

    Deciphering Clusters With a Deterministic Measure of Clustering Tendency

    Get PDF
    Clustering, a key aspect of exploratory data analysis, plays a crucial role in various fields such as information retrieval. Yet, the sheer volume and variety of available clustering algorithms hinder their application to specific tasks, especially given their propensity to enforce partitions, even when no clear clusters exist, often leading to fruitless efforts and erroneous conclusions. This issue highlights the importance of accurately assessing clustering tendencies prior to clustering. However, existing methods either rely on subjective visual assessment, which hinders automation of downstream tasks, or on correlations between subsets of target datasets and random distributions, limiting their practical use. Therefore, we introduce the Proximal Homogeneity Index (PHI) , a novel and deterministic statistic that reliably assesses the clustering tendencies of datasets by analyzing their internal structures via knowledge graphs. Leveraging PHI and the boundaries between clusters, we establish the Partitioning Sensitivity Index (PSI) , a new statistic designed for cluster quality assessment and optimal clustering identification. Comparative studies using twelve synthetic and real-world datasets demonstrate PHI and PSI's superiority over existing metrics for clustering tendency assessment and cluster validation. Furthermore, we demonstrate the scalability of PHI to large and high-dimensional datasets, and PSI's broad effectiveness across diverse cluster analysis tasks

    An Enhanced Sampling-Based Viewpoints Cosine Visual Model for an Efficient Big Data Clustering

    Get PDF
    Bunching is registering the item's similitude includes that can be utilized to segment the information. Object similarity (or dissimilarity) features are taken into account when locating relevant data object clusters. Removing the quantity of bunch data for any information is known as the grouping inclination. Top enormous information bunching calculations, similar to single pass k-implies (spkm), k-implies ++, smaller than usual group k-implies (mbkm), are created in the groups with k worth. By and by, the k worth is alloted by one or the other client or with any outside impedance. Along these lines, it is feasible to get this worth immovable once in a while. In the wake of concentrating on related work, it is researched that visual appraisal of (bunch) propensity (Tank) and its high level visual models extraordinarily decide the obscure group propensity esteem k. Multi-perspectives based cosine measure Tank (MVCM-Tank) utilized the multi-perspectives to evaluate grouping inclination better. Be that as it may, the MVCM-Tank experiences versatility issues in regards to computational time and memory designation. This paper improves the MVCM-Tank with the inspecting methodology to defeat the versatility issue for large information grouping. Trial investigation is performed utilizing the enormous gaussian engineered datasets and large constant datasets to show the effectiveness of the proposed work
    • …
    corecore