21,446 research outputs found
Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems
To comprehend the hierarchical organization of large integrated systems, we
introduce the hierarchical map equation, which reveals multilevel structures in
networks. In this information-theoretic approach, we exploit the duality
between compression and pattern detection; by compressing a description of a
random walker as a proxy for real flow on a network, we find regularities in
the network that induce this system-wide flow. Finding the shortest multilevel
description of the random walker therefore gives us the best hierarchical
clustering of the network, the optimal number of levels and modular partition
at each level, with respect to the dynamics on the network. With a novel search
algorithm, we extract and illustrate the rich multilevel organization of
several large social and biological networks. For example, from the global air
traffic network we uncover countries and continents, and from the pattern of
scientific communication we reveal more than 100 scientific fields organized in
four major disciplines: life sciences, physical sciences, ecology and earth
sciences, and social sciences. In general, we find shallow hierarchical
structures in globally interconnected systems, such as neural networks, and
rich multilevel organizations in systems with highly separated regions, such as
road networks.Comment: 11 pages, 5 figures. For associated code, see
http://www.tp.umu.se/~rosvall/code.htm
A Divide-and-Conquer Solver for Kernel Support Vector Machines
The kernel support vector machine (SVM) is one of the most widely used
classification methods; however, the amount of computation required becomes the
bottleneck when facing millions of samples. In this paper, we propose and
analyze a novel divide-and-conquer solver for kernel SVMs (DC-SVM). In the
division step, we partition the kernel SVM problem into smaller subproblems by
clustering the data, so that each subproblem can be solved independently and
efficiently. We show theoretically that the support vectors identified by the
subproblem solution are likely to be support vectors of the entire kernel SVM
problem, provided that the problem is partitioned appropriately by kernel
clustering. In the conquer step, the local solutions from the subproblems are
used to initialize a global coordinate descent solver, which converges quickly
as suggested by our analysis. By extending this idea, we develop a multilevel
Divide-and-Conquer SVM algorithm with adaptive clustering and early prediction
strategy, which outperforms state-of-the-art methods in terms of training
speed, testing accuracy, and memory usage. As an example, on the covtype
dataset with half-a-million samples, DC-SVM is 7 times faster than LIBSVM in
obtaining the exact SVM solution (to within relative error) which
achieves 96.15% prediction accuracy. Moreover, with our proposed early
prediction strategy, DC-SVM achieves about 96% accuracy in only 12 minutes,
which is more than 100 times faster than LIBSVM
A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids
The efficient implementation of collective communiction operations has
received much attention. Initial efforts produced "optimal" trees based on
network communication models that assumed equal point-to-point latencies
between any two processes. This assumption is violated in most practical
settings, however, particularly in heterogeneous systems such as clusters of
SMPs and wide-area "computational Grids," with the result that collective
operations perform suboptimally. In response, more recent work has focused on
creating topology-aware trees for collective operations that minimize
communication across slower channels (e.g., a wide-area network). While these
efforts have significant communication benefits, they all limit their view of
the network to only two layers. We present a strategy based upon a multilayer
view of the network. By creating multilevel topology-aware trees we take
advantage of communication cost differences at every level in the network. We
used this strategy to implement topology-aware versions of several MPI
collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of
the popular MPICH implementation of the MPI standard. Using information about
topology provided by MPICH-G2, we construct these multilevel topology-aware
trees automatically during execution. We present results demonstrating the
advantages of our multilevel approach by comparing it to the default
(topology-unaware) implementation provided by MPICH and a topology-aware
two-layer implementation.Comment: 16 pages, 8 figure
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
- …