Search CORE

1,079 research outputs found

Faster Algorithms for the Maximum Common Subtree Isomorphism Problem

Author: Droschinsky Andre
Kriege Nils M.
Mutzel Petra
Publication venue
Publication date: 01/01/2016
Field of study

The maximum common subtree isomorphism problem asks for the largest possible isomorphism between subtrees of two given input trees. This problem is a natural restriction of the maximum common subgraph problem, which is

{\sf NP}

-hard in general graphs. Confining to trees renders polynomial time algorithms possible and is of fundamental importance for approaches on more general graph classes. Various variants of this problem in trees have been intensively studied. We consider the general case, where trees are neither rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on the mapped vertices and edges. For trees of order

n

and maximum degree

\Delta

our algorithm achieves a running time of

\mathcal{O}(n^2\Delta)

by exploiting the structure of the matching instances arising as subproblems. Thus our algorithm outperforms the best previously known approaches. No faster algorithm is possible for trees of bounded degree and for trees of unbounded degree we show that a further reduction of the running time would directly improve the best known approach to the assignment problem. Combining a polynomial-delay algorithm for the enumeration of all maximum common subtree isomorphisms with central ideas of our new algorithm leads to an improvement of its running time from

\mathcal{O}(n^6+Tn^2)

\mathcal{O}(n^3+Tn\Delta)

, where

n

is the order of the larger tree,

T

is the number of different solutions, and

\Delta

is the minimum of the maximum degrees of the input trees. Our theoretical results are supplemented by an experimental evaluation on synthetic and real-world instances

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A Breezing Proof of the KMW Bound

Author: Coupette Corinna
Lenzen Christoph
Publication venue
Publication date: 16/09/2020
Field of study

In their seminal paper from 2004, Kuhn, Moscibroda, and Wattenhofer (KMW) proved a hardness result for several fundamental graph problems in the LOCAL model: For any (randomized) algorithm, there are input graphs with

n

nodes and maximum degree

\Delta

on which

\Omega(\min\{\sqrt{\log n/\log \log n},\log \Delta/\log \log \Delta\})

(expected) communication rounds are required to obtain polylogarithmic approximations to a minimum vertex cover, minimum dominating set, or maximum matching. Via reduction, this hardness extends to symmetry breaking tasks like finding maximal independent sets or maximal matchings. Today, more than

15

years later, there is still no proof of this result that is easy on the reader. Setting out to change this, in this work, we provide a fully self-contained and

\mathit{simple}

proof of the KMW lower bound. The key argument is algorithmic, and it relies on an invariant that can be readily verified from the generation rules of the lower bound graphs.Comment: 21 pages, 6 figure

arXiv.org e-Print Archive

Crossref

A sharp threshold for random graphs with a monochromatic triangle in every edge coloring

Author: Friedgut Ehud
Rodl Vojtech
Rucinski Andrzej
Tetali Prasad
Publication venue
Publication date: 01/01/2003
Field of study

Let

\R

be the set of all finite graphs

G

with the Ramsey property that every coloring of the edges of

G

by two colors yields a monochromatic triangle. In this paper we establish a sharp threshold for random graphs with this property. Let

G(n,p)

be the random graph on

n

vertices with edge probability

p

. We prove that there exists a function

\hat c=\hat c(n)

with

0 0

, as

n

tends to infinity Pr[G(n,(1-\eps)\hat c/\sqrt{n}) \in \R ] \to 0 and Pr [ G(n,(1+\eps)\hat c/\sqrt{n}) \in \R ] \to 1. A crucial tool that is used in the proof and is of independent interest is a generalization of Szemer\'edi's Regularity Lemma to a certain hypergraph setting.Comment: 101 pages, Final version - to appear in Memoirs of the A.M.

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

Tree comparison: enumeration and application to cheminformatics

Author: Droschinsky Andre
Publication venue
Publication date: 01/01/2021
Field of study

Graphs are a well-known data structure used in many application domains that rely on relationships between individual entities. Examples are social networks, where the users may be in friendship with each other, road networks, where one-way or bidirectional roads connect crossings, and work package assignments, where workers are assigned to tasks. In chem- and bioinformatics, molecules are often represented as molecular graphs, where vertices represent atoms, and bonds between them are represented by edges connecting the vertices. Since there is an ever-increasing amount of data that can be treated as graphs, fast algorithms are needed to compare such graphs. A well-researched concept to compare two graphs is the maximum common subgraph. On the one hand, this allows ﬁnding substructures that are common to both input graphs. On the other hand, we can derive a similarity score from the maximum common subgraph. A practical application is rational drug design which involves molecular similarity searches. In this thesis, we study the maximum common subgraph problem, which entails ﬁnding a largest graph, which is isomorphic to subgraphs of two input graphs. We focus on restrictions that allow polynomial-time algorithms with a low exponent. An example is the maximum common subtree of two input trees. We succeed in improving the previously best-known time bound. Additionally, we provide a lower time bound under certain assumptions. We study a generalization of the maximum common subtree problem, the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. This problem is motivated by the application to cheminformatics. First, the vast majority of drugs modeled as molecular graphs is outerplanar, and second, the blocks correspond to the ring structures and the bridges to atom chains or linkers. If we allow disconnected common subgraphs, the problem becomes NP-hard even for trees as input. We propose a second generalization of the maximum common subtree problem, which allows skipping vertices in the input trees while maintaining polynomial running time. Since a maximum common subgraph is not unique in general, we investigate the problem to enumerate all maximum solutions. We do this for both the maximum common subtree problem and the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. An arising subproblem which we analyze is the enumeration of maximum weight matchings in bipartite graphs. We support a weight function between the vertices and edges for all proposed common subgraph methods in this thesis. Thus the objective is to compute a common subgraph of maximum weight. The weights may be integral or real-valued, including negative values. A special case of using such a weight function is computing common subgraph isomorphisms between labeled graphs, where labels between mapped vertices and edges must be equal. An experimental study evaluates the practical running times and the usefulness of our block-and-bridge preserving maximum common induced subgraph algorithm against state of the art algorithms

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

kLog: A Language for Logical and Relational Learning with Kernels

Author: Altun
Ando
Antanas
Antanas
Antanas
Argyriou
Blockeel
Blockeel
Bottou
Boulicaut
Bröcheler
Ceroni
Chang
Chang
Cook
Costa
Costa
De
De Grave
De Grave
De Raedt
De Raedt
De Raedt
Dietterich
Dietterich
Evgeniou
Fabrizio Costa
Frasconi
Frasconi
Friedman
Gross
Gärtner
Gärtner
Haussler
Heckerman
Helma
Helma
Horváth
Joachims
Kazius
Kersting
Kersting
Kersting
Kimmig
Koller
Kordjamshidi
Kou
Kramer
Kurt De Grave
Lanckriet
Landwehr
Lao
Lari
London
Lowd
Luc De Raedt
Luks
Macskassy
Mahe
McCallum
McKay
Menchetti
Mitchell
Muggleton
Muggleton
Neville
Ng
Paolo Frasconi
Quinlan
Ralaivola
Richardson
Rizzolo
Rossi
Serebrenik
Shervashidze
Shi
Sorlin
Srinivasan
Srinivasan
Sun
Sutton
Taskar
Taskar
Tsochantaridis
van de Waterbeemd
Vazquez
Verbeke
Verbeke
Vishwanathan
Wachman
Wang
Wolpert
Yan
Publication venue: 'Elsevier BV'
Publication date: 28/07/2014
Field of study

We introduce kLog, a novel approach to statistical relational learning. Unlike standard approaches, kLog does not represent a probability distribution directly. It is rather a language to perform kernel-based learning on expressive logical and relational representations. kLog allows users to specify learning problems declaratively. It builds on simple but powerful concepts: learning from interpretations, entity/relationship data modeling, logic programming, and deductive databases. Access by the kernel to the rich representation is mediated by a technique we call graphicalization: the relational representation is first transformed into a graph --- in particular, a grounded entity/relationship diagram. Subsequently, a choice of graph kernel defines the feature space. kLog supports mixed numerical and symbolic data, as well as background knowledge in the form of Prolog or Datalog programs as in inductive logic programming systems. The kLog framework can be applied to tackle the same range of tasks that has made statistical relational learning so popular, including classification, regression, multitask learning, and collective classification. We also report about empirical comparisons, showing that kLog can be either more accurate, or much faster at the same level of accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at http://klog.dinfo.unifi.it along with tutorials

arXiv.org e-Print Archive

Lirias

Crossref

Malware Classification based on Call Graph Clustering

Author: Kinable Joris
Kostakis Orestis
Publication venue
Publication date: 25/08/2010
Field of study

Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, and enable the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and DBSCAN. Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by human malware analysts. Experiments show that it is indeed possible to accurately detect malware families via call graph clustering. We anticipate that in the future, call graphs can be used to analyse the emergence of new malware families, and ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding Agency for Technology and Innovation as part of its ICT SHOK Future Internet research programme, grant 40212/0

arXiv.org e-Print Archive

CiteSeerX

Repository TU/e

Combinatorial species and graph enumeration

Author: Hardt Andy
McNeely Pete
Phan Tung
Troyka Justin M.
Publication venue
Publication date: 01/01/2013
Field of study

In enumerative combinatorics, it is often a goal to enumerate both labeled and unlabeled structures of a given type. The theory of combinatorial species is a novel toolset which provides a rigorous foundation for dealing with the distinction between labeled and unlabeled structures. The cycle index series of a species encodes the labeled and unlabeled enumerative data of that species. Moreover, by using species operations, we are able to solve for the cycle index series of one species in terms of other, known cycle indices of other species. Section 3 is an exposition of species theory and Section 4 is an enumeration of point-determining bipartite graphs using this toolset. In Section 5, we extend a result about point-determining graphs to a similar result for point-determining {\Phi}-graphs, where {\Phi} is a class of graphs with certain properties. Finally, Appendix A is an expository on species computation using the software Sage [9] and Appendix B uses Sage to calculate the cycle index series of point-determining bipartite graphs.Comment: 39 pages, 16 figures, senior comprehensive project at Carleton Colleg

arXiv.org e-Print Archive

CiteSeerX

Carleton College: Digital Commons