Search CORE

35,512 research outputs found

Partitioning Complex Networks via Size-constrained Clustering

Author: B. Hendrickson
C. Chevalier
C. Walshaw
C. Walshaw
G. Karypis
I. Safro
L.F. Costa
P. Sanders
P. Sanders
R. Diekmann
T.N. Bui
Publication venue
Publication date: 01/01/2014
Field of study

The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be partitioned by some other algorithm. A partition of the input graph is then constructed by successively transferring the solution to the next finer graph and applying a local search algorithm to improve the current solution. In this paper, we describe a novel approach to partition graphs effectively especially if the networks have a highly irregular structure. More precisely, our algorithm provides graph coarsening by iteratively contracting size-constrained clusterings that are computed using a label propagation algorithm. The same algorithm that provides the size-constrained clusterings can also be used during uncoarsening as a fast and simple local search algorithm. Depending on the algorithm's configuration, we are able to compute partitions of very high quality outperforming all competitors, or partitions that are comparable to the best competitor in terms of quality, hMetis, while being nearly an order of magnitude faster on average. The fastest configuration partitions the largest graph available to us with 3.3 billion edges using a single machine in about ten minutes while cutting less than half of the edges than the fastest competitor, kMetis

arXiv.org e-Print Archive

CiteSeerX

Crossref

Parallel Graph Partitioning for Complex Networks

Author: Meyerhenke Henning
Sanders Peter
Schulz Christian
Publication venue
Publication date: 01/01/2015
Field of study

Processing large complex networks like social networks or web graphs has recently attracted considerable interest. In order to do this in parallel, we need to partition them into pieces of about equal size. Unfortunately, previous parallel graph partitioners originally developed for more regular mesh-like networks do not work well for these networks. This paper addresses this problem by parallelizing and adapting the label propagation technique originally developed for graph clustering. By introducing size constraints, label propagation becomes applicable for both the coarsening and the refinement phase of multilevel graph partitioning. We obtain very high quality by applying a highly parallel evolutionary algorithm to the coarsened graph. The resulting system is both more scalable and achieves higher quality than state-of-the-art systems like ParMetis or PT-Scotch. For large complex networks the performance differences are very big. For example, our algorithm can partition a web graph with 3.3 billion edges in less than sixteen seconds using 512 cores of a high performance cluster while producing a high quality partition -- none of the competing systems can handle this graph on our system.Comment: Review article. Parallelization of our previous approach arXiv:1402.328

arXiv.org e-Print Archive

CiteSeerX

On morphological hierarchical representations for image processing and spatial data clustering

Author: A. Baraldi
A. Rosenfeld
C. Jardine
C. Mattiussi
C. Ronse
C. Zahn
D. Wishart
E. Breen
F. Dias
F. Meyer
F. Meyer
F. Meyer
G. Bertrand
G. Estabrook
G. Matheron
G. Ouzounis
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Gower
J. Kruskal
J. Serra
J. Shi
J.P. Barthélemy
J.P. Benzécri
K. Florek
K. Spärck Jones
L. Gueguen
L. Guigues
L. Guigues
L. Hubert
L. Hubert
L. Hubert
L. Najman
L. Najman
L. Najman
L. Najman
L. Vincent
M. Nagao
M. Nagao
N. Ahuja
N. Jardine
N. Jardine
N. Jardine
O. Morris
P. Arbeláez
P. Felzenszwalb
P. Nacken
P. Salembier
P. Salembier
P. Salembier
P. Sneath
P. Soille
P. Soille
P. Soille
P. Soille
P. Soille
P. Soille
P. Soille
R. Adams
R. Cormack
R. Graham
R. Jones
R. Levillain
R. Marfil
R. Sokal
S. Beucher
S. Horowitz
S. Johnson
S. Zucker
T. Kong
T. Sørensen
W.G. Kropatsch
Z. Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the framework of mathematical morphology: constrained connectivity and ultrametric watersheds. Constrained connectivity can be viewed as a way to constrain an initial hierarchy in such a way that a set of desired constraints are satis ed. The framework of ultrametric watersheds provides a generic scheme for computing any hierarchical connected clustering, in particular when such a hierarchy is constrained. The suitability of this framework for solving practical problems is illustrated with applications in remote sensing

arXiv.org e-Print Archive

JRC Publications Repository

Crossref

Co-Clustering Network-Constrained Trajectory Data

Author: D Guo
Gook-Pil Roh
Marc Benkert
P Hansen
Panos Kalnis
T Brinkhoff
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Recently, clustering moving object trajectories kept gaining interest from both the data mining and machine learning communities. This problem, however, was studied mainly and extensively in the setting where moving objects can move freely on the euclidean space. In this paper, we study the problem of clustering trajectories of vehicles whose movement is restricted by the underlying road network. We model relations between these trajectories and road segments as a bipartite graph and we try to cluster its vertices. We demonstrate our approaches on synthetic data and show how it could be useful in inferring knowledge about the flow dynamics and the behavior of the drivers using the road network

arXiv.org e-Print Archive

Crossref

HAL-Paris1

ClustGeo: an R package for hierarchical clustering with spatial constraints

Author: Chavent Marie
Kuentz-Simonet Vanessa
Labenne Amaury
Saracco Jérôme
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2017
Field of study

In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices

D_0

and

D_1

are inputted, along with a mixing parameter

\alpha \in [0,1]

. The dissimilarities can be non-Euclidean and the weights of the observations can be non-uniform. The first matrix gives the dissimilarities in the "feature space" and the second matrix gives the dissimilarities in the "constraint space". The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with

D_0

and the homogeneity criterion calculated with

D_1

. The idea is then to determine a value of

\alpha

which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Oskar Bordeaux

Semi-supervised cross-entropy clustering with information bottleneck constraint

Author: Geiger Bernhard C.
Śmieja Marek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository