1,079 research outputs found
Faster Algorithms for the Maximum Common Subtree Isomorphism Problem
The maximum common subtree isomorphism problem asks for the largest possible
isomorphism between subtrees of two given input trees. This problem is a
natural restriction of the maximum common subgraph problem, which is -hard in general graphs. Confining to trees renders polynomial time
algorithms possible and is of fundamental importance for approaches on more
general graph classes. Various variants of this problem in trees have been
intensively studied. We consider the general case, where trees are neither
rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on
the mapped vertices and edges. For trees of order and maximum degree
our algorithm achieves a running time of by
exploiting the structure of the matching instances arising as subproblems. Thus
our algorithm outperforms the best previously known approaches. No faster
algorithm is possible for trees of bounded degree and for trees of unbounded
degree we show that a further reduction of the running time would directly
improve the best known approach to the assignment problem. Combining a
polynomial-delay algorithm for the enumeration of all maximum common subtree
isomorphisms with central ideas of our new algorithm leads to an improvement of
its running time from to ,
where is the order of the larger tree, is the number of different
solutions, and is the minimum of the maximum degrees of the input
trees. Our theoretical results are supplemented by an experimental evaluation
on synthetic and real-world instances
A Breezing Proof of the KMW Bound
In their seminal paper from 2004, Kuhn, Moscibroda, and Wattenhofer (KMW)
proved a hardness result for several fundamental graph problems in the LOCAL
model: For any (randomized) algorithm, there are input graphs with nodes
and maximum degree on which (expected) communication rounds are
required to obtain polylogarithmic approximations to a minimum vertex cover,
minimum dominating set, or maximum matching. Via reduction, this hardness
extends to symmetry breaking tasks like finding maximal independent sets or
maximal matchings. Today, more than years later, there is still no proof
of this result that is easy on the reader. Setting out to change this, in this
work, we provide a fully self-contained and proof of the KMW
lower bound. The key argument is algorithmic, and it relies on an invariant
that can be readily verified from the generation rules of the lower bound
graphs.Comment: 21 pages, 6 figure
A sharp threshold for random graphs with a monochromatic triangle in every edge coloring
Let be the set of all finite graphs with the Ramsey property that
every coloring of the edges of by two colors yields a monochromatic
triangle. In this paper we establish a sharp threshold for random graphs with
this property. Let be the random graph on vertices with edge
probability . We prove that there exists a function with
, as tends to infinity
Pr[G(n,(1-\eps)\hat c/\sqrt{n}) \in \R ] \to 0 and Pr [ G(n,(1+\eps)\hat
c/\sqrt{n}) \in \R ] \to 1. A crucial tool that is used in the proof and is
of independent interest is a generalization of Szemer\'edi's Regularity Lemma
to a certain hypergraph setting.Comment: 101 pages, Final version - to appear in Memoirs of the A.M.
Tree comparison: enumeration and application to cheminformatics
Graphs are a well-known data structure used in many application domains that rely on relationships between individual entities. Examples are social networks, where the users may be in friendship with each other, road networks, where one-way or bidirectional roads connect crossings, and work package assignments, where workers are assigned to tasks. In chem- and bioinformatics, molecules are often represented as molecular graphs, where vertices represent atoms, and bonds between them are represented by edges connecting the vertices. Since there is an ever-increasing amount of data that can be treated as graphs, fast algorithms are needed to compare such graphs. A well-researched concept to compare two graphs is the maximum common subgraph. On the one hand, this allows finding substructures that are common to both input graphs. On the other hand, we can derive a similarity score from the maximum common subgraph. A practical application is rational drug design which involves molecular similarity searches.
In this thesis, we study the maximum common subgraph problem, which entails finding a largest graph, which is isomorphic to subgraphs of two input graphs. We focus on restrictions that allow polynomial-time algorithms with a low exponent. An example is the maximum common subtree of two input trees. We succeed in improving the previously best-known time bound. Additionally, we provide a lower time bound under certain assumptions. We study a generalization of the maximum common subtree problem, the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. This problem is motivated by the application to cheminformatics. First, the vast majority of drugs modeled as molecular graphs is outerplanar, and second, the blocks correspond to the ring structures and the bridges to atom chains or linkers. If we allow disconnected common subgraphs, the problem becomes NP-hard even for trees as input. We propose a second generalization of the maximum common subtree problem, which allows skipping vertices in the input trees while maintaining polynomial running time.
Since a maximum common subgraph is not unique in general, we investigate the problem to enumerate all maximum solutions. We do this for both the maximum common subtree problem and the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. An arising subproblem which we analyze is the enumeration of maximum weight matchings in bipartite graphs. We support a weight function between the vertices and edges for all proposed common subgraph methods in this thesis. Thus the objective is to compute a common subgraph of maximum weight. The weights may be integral or real-valued, including negative values. A special case of using such a weight function is computing common subgraph isomorphisms between labeled graphs, where labels between mapped vertices and edges must be equal. An experimental study evaluates the practical running times and the usefulness of our block-and-bridge preserving maximum common induced subgraph algorithm against state of the art algorithms
kLog: A Language for Logical and Relational Learning with Kernels
We introduce kLog, a novel approach to statistical relational learning.
Unlike standard approaches, kLog does not represent a probability distribution
directly. It is rather a language to perform kernel-based learning on
expressive logical and relational representations. kLog allows users to specify
learning problems declaratively. It builds on simple but powerful concepts:
learning from interpretations, entity/relationship data modeling, logic
programming, and deductive databases. Access by the kernel to the rich
representation is mediated by a technique we call graphicalization: the
relational representation is first transformed into a graph --- in particular,
a grounded entity/relationship diagram. Subsequently, a choice of graph kernel
defines the feature space. kLog supports mixed numerical and symbolic data, as
well as background knowledge in the form of Prolog or Datalog programs as in
inductive logic programming systems. The kLog framework can be applied to
tackle the same range of tasks that has made statistical relational learning so
popular, including classification, regression, multitask learning, and
collective classification. We also report about empirical comparisons, showing
that kLog can be either more accurate, or much faster at the same level of
accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at
http://klog.dinfo.unifi.it along with tutorials
Malware Classification based on Call Graph Clustering
Each day, anti-virus companies receive tens of thousands samples of
potentially harmful executables. Many of the malicious samples are variations
of previously encountered malware, created by their authors to evade
pattern-based detection. Dealing with these large amounts of data requires
robust, automatic detection approaches. This paper studies malware
classification based on call graph clustering. By representing malware samples
as call graphs, it is possible to abstract certain variations away, and enable
the detection of structural similarities between samples. The ability to
cluster similar samples together will make more generic detection techniques
possible, thereby targeting the commonalities of the samples within a cluster.
To compare call graphs mutually, we compute pairwise graph similarity scores
via graph matchings which approximately minimize the graph edit distance. Next,
to facilitate the discovery of similar malware samples, we employ several
clustering algorithms, including k-medoids and DBSCAN. Clustering experiments
are conducted on a collection of real malware samples, and the results are
evaluated against manual classifications provided by human malware analysts.
Experiments show that it is indeed possible to accurately detect malware
families via call graph clustering. We anticipate that in the future, call
graphs can be used to analyse the emergence of new malware families, and
ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding
Agency for Technology and Innovation as part of its ICT SHOK Future Internet
research programme, grant 40212/0
Combinatorial species and graph enumeration
In enumerative combinatorics, it is often a goal to enumerate both labeled
and unlabeled structures of a given type. The theory of combinatorial species
is a novel toolset which provides a rigorous foundation for dealing with the
distinction between labeled and unlabeled structures. The cycle index series of
a species encodes the labeled and unlabeled enumerative data of that species.
Moreover, by using species operations, we are able to solve for the cycle index
series of one species in terms of other, known cycle indices of other species.
Section 3 is an exposition of species theory and Section 4 is an enumeration of
point-determining bipartite graphs using this toolset. In Section 5, we extend
a result about point-determining graphs to a similar result for
point-determining {\Phi}-graphs, where {\Phi} is a class of graphs with certain
properties. Finally, Appendix A is an expository on species computation using
the software Sage [9] and Appendix B uses Sage to calculate the cycle index
series of point-determining bipartite graphs.Comment: 39 pages, 16 figures, senior comprehensive project at Carleton
Colleg
- …