20,570 research outputs found

    Between Subgraph Isomorphism and Maximum Common Subgraph

    Get PDF
    When a small pattern graph does not occur inside a larger target graph, we can ask how to find "as much of the pattern as possible" inside the target graph. In general, this is known as the maximum common subgraph problem, which is much more computationally challenging in practice than subgraph isomorphism. We introduce a restricted alternative, where we ask if all but k vertices from the pattern can be found in the target graph. This allows for the development of slightly weakened forms of certain invariants from subgraph isomorphism which are based upon degree and number of paths. We show that when k is small, weakening the invariants still retains much of their effectiveness. We are then able to solve this problem on the standard problem instances used to benchmark subgraph isomorphism algorithms, despite these instances being too large for current maximum common subgraph algorithms to handle. Finally, by iteratively increasing k, we obtain an algorithm which is also competitive for the maximum common subgraph

    Pattern matching and pattern discovery algorithms for protein topologies

    Get PDF
    We describe algorithms for pattern matching and pattern learning in TOPS diagrams (formal descriptions of protein topologies). These problems can be reduced to checking for subgraph isomorphism and finding maximal common subgraphs in a restricted class of ordered graphs. We have developed a subgraph isomorphism algorithm for ordered graphs, which performs well on the given set of data. The maximal common subgraph problem then is solved by repeated subgraph extension and checking for isomorphisms. Despite the apparent inefficiency such approach gives an algorithm with time complexity proportional to the number of graphs in the input set and is still practical on the given set of data. As a result we obtain fast methods which can be used for building a database of protein topological motifs, and for the comparison of a given protein of known secondary structure against a motif database

    Graph theoretic methods for the analysis of structural relationships in biological macromolecules

    Get PDF
    Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures

    Quantum Query Complexity of Subgraph Isomorphism and Homomorphism

    Get PDF
    Let HH be a fixed graph on nn vertices. Let fH(G)=1f_H(G) = 1 iff the input graph GG on nn vertices contains HH as a (not necessarily induced) subgraph. Let αH\alpha_H denote the cardinality of a maximum independent set of HH. In this paper we show: Q(fH)=Ω(αHn),Q(f_H) = \Omega\left(\sqrt{\alpha_H \cdot n}\right), where Q(fH)Q(f_H) denotes the quantum query complexity of fHf_H. As a consequence we obtain a lower bounds for Q(fH)Q(f_H) in terms of several other parameters of HH such as the average degree, minimum vertex cover, chromatic number, and the critical probability. We also use the above bound to show that Q(fH)=Ω(n3/4)Q(f_H) = \Omega(n^{3/4}) for any HH, improving on the previously best known bound of Ω(n2/3)\Omega(n^{2/3}). Until very recently, it was believed that the quantum query complexity is at least square root of the randomized one. Our Ω(n3/4)\Omega(n^{3/4}) bound for Q(fH)Q(f_H) matches the square root of the current best known bound for the randomized query complexity of fHf_H, which is Ω(n3/2)\Omega(n^{3/2}) due to Gr\"oger. Interestingly, the randomized bound of Ω(αHn)\Omega(\alpha_H \cdot n) for fHf_H still remains open. We also study the Subgraph Homomorphism Problem, denoted by f[H]f_{[H]}, and show that Q(f[H])=Ω(n)Q(f_{[H]}) = \Omega(n). Finally we extend our results to the 33-uniform hypergraphs. In particular, we show an Ω(n4/5)\Omega(n^{4/5}) bound for quantum query complexity of the Subgraph Isomorphism, improving on the previously known Ω(n3/4)\Omega(n^{3/4}) bound. For the Subgraph Homomorphism, we obtain an Ω(n3/2)\Omega(n^{3/2}) bound for the same.Comment: 16 pages, 2 figure

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Full text link
    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.Comment: VLDB201
    corecore