18,032 research outputs found
On Instance Weighted Clustering Ensembles
© ESANN, 2023. This is the accepted manuscript version of an article which has been published in final form at: www.esann.org/proceedings/2023Ensemble clustering is a technique which combines multipleclustering results, and instance weighting is a technique which highlightsimportant instances in a dataset. Both techniques are known to enhanceclustering performance and robustness. In this research, ensembles andinstance weighting are integrated with the spectral clustering algorithm.We believe this is the first attempt at creating diversity in the generativemechanism using density based instance weighting for a spectral ensemble.The proposed approach is empirically validated using synthetic datasetscomparing against spectral and a spectral ensemble with random instanceweighting. Results show that using the instance weighted sub-samplingapproach as the generative mechanism for an ensemble of spectral cluster-ing leads to improved clustering performance on datasets with imbalancedclusters.Peer reviewe
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
Reconstructing the world trade multiplex: the role of intensive and extensive biases
In economic and financial networks, the strength of each node has always an
important economic meaning, such as the size of supply and demand, import and
export, or financial exposure. Constructing null models of networks matching
the observed strengths of all nodes is crucial in order to either detect
interesting deviations of an empirical network from economically meaningful
benchmarks or reconstruct the most likely structure of an economic network when
the latter is unknown. However, several studies have proved that real economic
networks and multiplexes are topologically very different from configurations
inferred only from node strengths. Here we provide a detailed analysis of the
World Trade Multiplex by comparing it to an enhanced null model that
simultaneously reproduces the strength and the degree of each node. We study
several temporal snapshots and almost one hundred layers (commodity classes) of
the multiplex and find that the observed properties are systematically well
reproduced by our model. Our formalism allows us to introduce the (static)
concept of extensive and intensive bias, defined as a measurable tendency of
the network to prefer either the formation of extra links or the reinforcement
of link weights, with respect to a reference case where only strengths are
enforced. Our findings complement the existing economic literature on (dynamic)
intensive and extensive trade margins. More in general, they show that
real-world multiplexes can be strongly shaped by layer-specific local
constraints
On Thermalization in Classical Scalar Field Theory
Thermalization of classical fields is investigated in a \phi^4 scalar field
theory in 1+1 dimensions, discretized on a lattice. We numerically integrate
the classical equations of motion using initial conditions sampled from various
nonequilibrium probability distributions. Time-dependent expectation values of
observables constructed from the canonical momentum are compared with thermal
ones. It is found that a closed system, evolving from one initial condition,
thermalizes to high precision in the thermodynamic limit, in a time-averaged
sense. For ensembles consisting of many members with the same energy, we find
that expectation values become stationary - and equal to the thermal values -
in the limit of infinitely many members. Initial ensembles with a nonzero
(noncanonical) spread in the energy density or other conserved quantities
evolve to noncanonical stationary ensembles. In the case of a narrow spread,
asymptotic values of primary observables are only mildly affected. In contrast,
fluctuations and connected correlation functions will differ substantially from
the canonical values. This raises doubts on the use of a straightforward
expansion in terms of 1PI-vertex functions to study thermalization.Comment: 17 pages with 6 eps figure
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
Low-temperature behaviour of social and economic networks
Real-world social and economic networks typically display a number of
particular topological properties, such as a giant connected component, a broad
degree distribution, the small-world property and the presence of communities
of densely interconnected nodes. Several models, including ensembles of
networks also known in social science as Exponential Random Graphs, have been
proposed with the aim of reproducing each of these properties in isolation.
Here we define a generalized ensemble of graphs by introducing the concept of
graph temperature, controlling the degree of topological optimization of a
network. We consider the temperature-dependent version of both existing and
novel models and show that all the aforementioned topological properties can be
simultaneously understood as the natural outcomes of an optimized,
low-temperature topology. We also show that seemingly different graph models,
as well as techniques used to extract information from real networks, are all
found to be particular low-temperature cases of the same generalized formalism.
One such technique allows us to extend our approach to real weighted networks.
Our results suggest that a low graph temperature might be an ubiquitous
property of real socio-economic networks, placing conditions on the diffusion
of information across these systems
- …