3,282 research outputs found
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
The Physics of Communicability in Complex Networks
A fundamental problem in the study of complex networks is to provide
quantitative measures of correlation and information flow between different
parts of a system. To this end, several notions of communicability have been
introduced and applied to a wide variety of real-world networks in recent
years. Several such communicability functions are reviewed in this paper. It is
emphasized that communication and correlation in networks can take place
through many more routes than the shortest paths, a fact that may not have been
sufficiently appreciated in previously proposed correlation measures. In
contrast to these, the communicability measures reviewed in this paper are
defined by taking into account all possible routes between two nodes, assigning
smaller weights to longer ones. This point of view naturally leads to the
definition of communicability in terms of matrix functions, such as the
exponential, resolvent, and hyperbolic functions, in which the matrix argument
is either the adjacency matrix or the graph Laplacian associated with the
network. Considerable insight on communicability can be gained by modeling a
network as a system of oscillators and deriving physical interpretations, both
classical and quantum-mechanical, of various communicability functions.
Applications of communicability measures to the analysis of complex systems are
illustrated on a variety of biological, physical and social networks. The last
part of the paper is devoted to a review of the notion of locality in complex
networks and to computational aspects that by exploiting sparsity can greatly
reduce the computational efforts for the calculation of communicability
functions for large networks.Comment: Review Article. 90 pages, 14 figures. Contents: Introduction;
Communicability in Networks; Physical Analogies; Comparing Communicability
Functions; Communicability and the Analysis of Networks; Communicability and
Localization in Complex Networks; Computability of Communicability Functions;
Conclusions and Prespective
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
- …