19,159 research outputs found
Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering
The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results
A smart local moving algorithm for large-scale modularity-based community detection
We introduce a new algorithm for modularity-based community detection in
large networks. The algorithm, which we refer to as a smart local moving
algorithm, takes advantage of a well-known local moving heuristic that is also
used by other algorithms. Compared with these other algorithms, our proposed
algorithm uses the local moving heuristic in a more sophisticated way. Based on
an analysis of a diverse set of networks, we show that our smart local moving
algorithm identifies community structures with higher modularity values than
other algorithms for large-scale modularity optimization, among which the
popular 'Louvain algorithm' introduced by Blondel et al. (2008). The
computational efficiency of our algorithm makes it possible to perform
community detection in networks with tens of millions of nodes and hundreds of
millions of edges. Our smart local moving algorithm also performs well in small
and medium-sized networks. In short computing times, it identifies community
structures with modularity values equally high as, or almost as high as, the
highest values reported in the literature, and sometimes even higher than the
highest values found in the literature
Towards realistic artificial benchmark for community detection algorithms evaluation
Assessing the partitioning performance of community detection algorithms is
one of the most important issues in complex network analysis. Artificially
generated networks are often used as benchmarks for this purpose. However,
previous studies showed their level of realism have a significant effect on the
algorithms performance. In this study, we adopt a thorough experimental
approach to tackle this problem and investigate this effect. To assess the
level of realism, we use consensual network topological properties. Based on
the LFR method, the most realistic generative method to date, we propose two
alternative random models to replace the Configuration Model originally used in
this algorithm, in order to increase its realism. Experimental results show
both modifications allow generating collections of community-structured
artificial networks whose topological properties are closer to those
encountered in real-world networks. Moreover, the results obtained with eleven
popular community identification algorithms on these benchmarks show their
performance decrease on more realistic networks
- …