17,363 research outputs found

    On Instance Weighted Clustering Ensembles

    Get PDF
    © ESANN, 2023. This is the accepted manuscript version of an article which has been published in final form at: www.esann.org/proceedings/2023Ensemble clustering is a technique which combines multipleclustering results, and instance weighting is a technique which highlightsimportant instances in a dataset. Both techniques are known to enhanceclustering performance and robustness. In this research, ensembles andinstance weighting are integrated with the spectral clustering algorithm.We believe this is the first attempt at creating diversity in the generativemechanism using density based instance weighting for a spectral ensemble.The proposed approach is empirically validated using synthetic datasetscomparing against spectral and a spectral ensemble with random instanceweighting. Results show that using the instance weighted sub-samplingapproach as the generative mechanism for an ensemble of spectral cluster-ing leads to improved clustering performance on datasets with imbalancedclusters.Peer reviewe

    Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis

    Full text link
    The clustering ensemble technique aims to combine multiple clusterings into a probably better and more robust clustering and has been receiving an increasing attention in recent years. There are mainly two aspects of limitations in the existing clustering ensemble approaches. Firstly, many approaches lack the ability to weight the base clusterings without access to the original data and can be affected significantly by the low-quality, or even ill clusterings. Secondly, they generally focus on the instance level or cluster level in the ensemble system and fail to integrate multi-granularity cues into a unified model. To address these two limitations, this paper proposes to solve the clustering ensemble problem via crowd agreement estimation and multi-granularity link analysis. We present the normalized crowd agreement index (NCAI) to evaluate the quality of base clusterings in an unsupervised manner and thus weight the base clusterings in accordance with their clustering validity. To explore the relationship between clusters, the source aware connected triple (SACT) similarity is introduced with regard to their common neighbors and the source reliability. Based on NCAI and multi-granularity information collected among base clusterings, clusters, and data instances, we further propose two novel consensus functions, termed weighted evidence accumulation clustering (WEAC) and graph partitioning with multi-granularity link analysis (GP-MGLA) respectively. The experiments are conducted on eight real-world datasets. The experimental results demonstrate the effectiveness and robustness of the proposed methods.Comment: The MATLAB source code of this work is available at: https://www.researchgate.net/publication/28197031

    Reconstructing the world trade multiplex: the role of intensive and extensive biases

    Get PDF
    In economic and financial networks, the strength of each node has always an important economic meaning, such as the size of supply and demand, import and export, or financial exposure. Constructing null models of networks matching the observed strengths of all nodes is crucial in order to either detect interesting deviations of an empirical network from economically meaningful benchmarks or reconstruct the most likely structure of an economic network when the latter is unknown. However, several studies have proved that real economic networks and multiplexes are topologically very different from configurations inferred only from node strengths. Here we provide a detailed analysis of the World Trade Multiplex by comparing it to an enhanced null model that simultaneously reproduces the strength and the degree of each node. We study several temporal snapshots and almost one hundred layers (commodity classes) of the multiplex and find that the observed properties are systematically well reproduced by our model. Our formalism allows us to introduce the (static) concept of extensive and intensive bias, defined as a measurable tendency of the network to prefer either the formation of extra links or the reinforcement of link weights, with respect to a reference case where only strengths are enforced. Our findings complement the existing economic literature on (dynamic) intensive and extensive trade margins. More in general, they show that real-world multiplexes can be strongly shaped by layer-specific local constraints

    On Thermalization in Classical Scalar Field Theory

    Get PDF
    Thermalization of classical fields is investigated in a \phi^4 scalar field theory in 1+1 dimensions, discretized on a lattice. We numerically integrate the classical equations of motion using initial conditions sampled from various nonequilibrium probability distributions. Time-dependent expectation values of observables constructed from the canonical momentum are compared with thermal ones. It is found that a closed system, evolving from one initial condition, thermalizes to high precision in the thermodynamic limit, in a time-averaged sense. For ensembles consisting of many members with the same energy, we find that expectation values become stationary - and equal to the thermal values - in the limit of infinitely many members. Initial ensembles with a nonzero (noncanonical) spread in the energy density or other conserved quantities evolve to noncanonical stationary ensembles. In the case of a narrow spread, asymptotic values of primary observables are only mildly affected. In contrast, fluctuations and connected correlation functions will differ substantially from the canonical values. This raises doubts on the use of a straightforward expansion in terms of 1PI-vertex functions to study thermalization.Comment: 17 pages with 6 eps figure

    LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles

    Get PDF
    Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.

    A CLUE for CLUster Ensembles

    Get PDF
    Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and "secondary" clusterings.
    corecore