413 research outputs found
Faster Algorithms for the Maximum Common Subtree Isomorphism Problem
The maximum common subtree isomorphism problem asks for the largest possible
isomorphism between subtrees of two given input trees. This problem is a
natural restriction of the maximum common subgraph problem, which is -hard in general graphs. Confining to trees renders polynomial time
algorithms possible and is of fundamental importance for approaches on more
general graph classes. Various variants of this problem in trees have been
intensively studied. We consider the general case, where trees are neither
rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on
the mapped vertices and edges. For trees of order and maximum degree
our algorithm achieves a running time of by
exploiting the structure of the matching instances arising as subproblems. Thus
our algorithm outperforms the best previously known approaches. No faster
algorithm is possible for trees of bounded degree and for trees of unbounded
degree we show that a further reduction of the running time would directly
improve the best known approach to the assignment problem. Combining a
polynomial-delay algorithm for the enumeration of all maximum common subtree
isomorphisms with central ideas of our new algorithm leads to an improvement of
its running time from to ,
where is the order of the larger tree, is the number of different
solutions, and is the minimum of the maximum degrees of the input
trees. Our theoretical results are supplemented by an experimental evaluation
on synthetic and real-world instances
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
Gradual Weisfeiler-Leman: Slow and Steady Wins the Race
The classical Weisfeiler-Leman algorithm aka color refinement is fundamental
for graph learning and central for successful graph kernels and graph neural
networks. Originally developed for graph isomorphism testing, the algorithm
iteratively refines vertex colors. On many datasets, the stable coloring is
reached after a few iterations and the optimal number of iterations for machine
learning tasks is typically even lower. This suggests that the colors diverge
too fast, defining a similarity that is too coarse. We generalize the concept
of color refinement and propose a framework for gradual neighborhood
refinement, which allows a slower convergence to the stable coloring and thus
provides a more fine-grained refinement hierarchy and vertex similarity. We
assign new colors by clustering vertex neighborhoods, replacing the original
injective color assignment function. Our approach is used to derive new
variants of existing graph kernels and to approximate the graph edit distance
via optimal assignments regarding vertex similarity. We show that in both
tasks, our method outperforms the original color refinement with only moderate
increase in running time advancing the state of the art
Largest Weight Common Subtree Embeddings with Distance Penalties
The largest common embeddable subtree problem asks for the largest possible tree embeddable into two input trees and generalizes the classical maximum common subtree problem. Several variants of the problem in labeled and unlabeled rooted trees have been studied, e.g., for the comparison of evolutionary trees. We consider a generalization, where the sought embedding is maximal with regard to a weight function on pairs of labels. We support rooted and unrooted trees with vertex and edge labels as well as distance penalties for skipping vertices. This variant is important for many applications such as the comparison of chemical structures and evolutionary trees. Our algorithm computes the solution from a series of bipartite matching instances, which are solved efficiently by exploiting their structural relation and imbalance. Our analysis shows that our approach improves or matches the running time of the formally best algorithms for several problem variants. Specifically, we obtain a running time of O(|T| |T\u27|Delta) for two rooted or unrooted trees T and T\u27, where Delta=min{Delta(T),Delta(T\u27)} with Delta(X) the maximum degree of X. If the weights are integral and at most C, we obtain a running time of O(|T| |T\u27|sqrt Delta log (C min{|T|,|T\u27|})) for rooted trees
EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases
The graph edit distance is an intuitive measure to quantify the dissimilarity
of graphs, but its computation is NP-hard and challenging in practice. We
introduce methods for answering nearest neighbor and range queries regarding
this distance efficiently for large databases with up to millions of graphs. We
build on the filter-verification paradigm, where lower and upper bounds are
used to reduce the number of exact computations of the graph edit distance.
Highly effective bounds for this involve solving a linear assignment problem
for each graph in the database, which is prohibitive in massive datasets.
Index-based approaches typically provide only weak bounds leading to high
computational costs verification. In this work, we derive novel lower bounds
for efficient filtering from restricted assignment problems, where the cost
function is a tree metric. This special case allows embedding the costs of
optimal assignments isometrically into space, rendering efficient
indexing possible. We propose several lower bounds of the graph edit distance
obtained from tree metrics reflecting the edit costs, which are combined for
effective filtering. Our method termed EmbAssi can be integrated into existing
filter-verification pipelines as a fast and effective pre-filtering step.
Empirically we show that for many real-world graphs our lower bounds are
already close to the exact graph edit distance, while our index construction
and search scales to very large databases
Breast Cancer Screening in Women with a Familial or Genetic Predisposition: the role of MRI
Women with a strong family history of breast and/or ovarian cancer combined with young
ages at diagnosis of affected family members have an increased risk of these types of cancer.
In 1994 and 1995 respectively, the BRCA1 and BRCA2 genes were identified. A germline
mutation in one of these genes is associated with very high risks of early onset breast and
ovarian cancer.
Current options for BRCA1/2 mutation carriers to reduce their risk of breast cancer or death
by breast cancer include prophylactic mastectomy, prophylactic salpingo-oophorectomy,
chemoprevention and screening. Screening for breast cancer is also offered to women with a
familial predisposition, but without a proven BRCA1/2 mutation. Several studies have
investigated the efficacy of mammographic screening, sometimes in combination with clinical
breast examination (CBE)) in high-risk groups of women. However, the efficacy of
mammography screening has never been clearly demonstrated. Sensitivity of mammography
was low this group of women in comparison with post-menopausal women screened in
population based studies, most likely because of the young screening age of and consequently
frequent a high density of the breast tissue. MRI appeared to be a sensitive method for
detection of breast cancer in a diagnostic setting. For this reason, in the late nineties several
breast cancer screening studies comparing the value of MRI and mammography were set up
in women with a genetic susceptibility. Results of pilot and preliminary studies showed in all
of them a very high sensitivity of MRI, while sensitivity of mammography was never higher
than 50%. Recently, the first results of four large prospective studies were published, among
which the Dutch national MRISC study. In this thesis, the short-term results of the MRISC
study are described. Two of the main objectives of the MRISC study are addressed in this
thesis:
1. Assessment of the efficacy of screening in diagnosing early-stage breast cancer in women
with a familial or genetic predisposition.
2. Assessment of the value of MRI in this screening scheme compared to mammography
- …