9,680 research outputs found
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
Dynamic Discovery of Type Classes and Relations in Semantic Web Data
The continuing development of Semantic Web technologies and the increasing
user adoption in the recent years have accelerated the progress incorporating
explicit semantics with data on the Web. With the rapidly growing RDF (Resource
Description Framework) data on the Semantic Web, processing large semantic
graph data have become more challenging. Constructing a summary graph structure
from the raw RDF can help obtain semantic type relations and reduce the
computational complexity for graph processing purposes. In this paper, we
addressed the problem of graph summarization in RDF graphs, and we proposed an
approach for building summary graph structures automatically from RDF graph
data. Moreover, we introduced a measure to help discover optimum class
dissimilarity thresholds and an effective method to discover the type classes
automatically. In future work, we plan to investigate further improvement
options on the scalability of the proposed method
Log-based Evaluation of Label Splits for Process Models
Process mining techniques aim to extract insights in processes from event
logs. One of the challenges in process mining is identifying interesting and
meaningful event labels that contribute to a better understanding of the
process. Our application area is mining data from smart homes for elderly,
where the ultimate goal is to signal deviations from usual behavior and provide
timely recommendations in order to extend the period of independent living.
Extracting individual process models showing user behavior is an important
instrument in achieving this goal. However, the interpretation of sensor data
at an appropriate abstraction level is not straightforward. For example, a
motion sensor in a bedroom can be triggered by tossing and turning in bed or by
getting up. We try to derive the actual activity depending on the context
(time, previous events, etc.). In this paper we introduce the notion of label
refinements, which links more abstract event descriptions with their more
refined counterparts. We present a statistical evaluation method to determine
the usefulness of a label refinement for a given event log from a process
perspective. Based on data from smart homes, we show how our statistical
evaluation method for label refinements can be used in practice. Our method was
able to select two label refinements out of a set of candidate label
refinements that both had a positive effect on model precision.Comment: Paper accepted at the 20th International Conference on
Knowledge-Based and Intelligent Information & Engineering Systems, to appear
in Procedia Computer Scienc
Neighborhood Structure Configuration Models
We develop a new method to efficiently sample synthetic networks that
preserve the d-hop neighborhood structure of a given network for any given d.
The proposed algorithm trades off the diversity in network samples against the
depth of the neighborhood structure that is preserved. Our key innovation is to
employ a colored Configuration Model with colors derived from iterations of the
so-called Color Refinement algorithm. We prove that with increasing iterations
the preserved structural information increases: the generated synthetic
networks and the original network become more and more similar, and are
eventually indistinguishable in terms of centrality measures such as PageRank,
HITS, Katz centrality and eigenvector centrality. Our work enables to
efficiently generate samples with a precisely controlled similarity to the
original network, especially for large networks
- …