56 research outputs found

    쌍별 색 개선과 효율적인 백트래킹을 이용한 빠른 그래프 동형 알고리즘

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 구건모.Graph isomorphism is a core problem in graph analysis of various domains including social networks, bioinformatics, chemistry, and so on. As real-world graphs are getting bigger and bigger, applications demand practically fast algorithms that can run on large-scale graphs. Existing approaches, however, show limited performances on large-scale real-world graphs either in time or space. Also, graph isomorphism query processing is often required in many applications, which is a natural generalization of graph isomorphism for multiple graphs. In this thesis we present fast algorithms for graph isomorphism and graph isomorphism query processing. First, we present a new approach to graph isomorphism, which is the framework of pairwise color refinement and efficient backtracking. Within the framework, we introduce three efficient techniques, which together lead to a much faster and scalable algorithm for graph isomorphism. Experiments on real-world datasets show that our algorithm outperforms state-of-the-art solutions by up to several orders of magnitude in terms of running time. Second, We develop an efficient algorithm for graph isomorphism query processing. We use a two-level index using degree sequences and color-label distributions. Experimental results on real datasets show that our algorithm is orders of magnitude faster than the state-of-the-art algorithms in terms of index construction time, and it runs faster than existing algorithms in terms of query processing time as the graph sizes increase.그래프 동형 문제는 소셜 네트워크 서비스, 생물정보학, 화학정보학 등등 다양한 응용 분야에서 그래프 분석을 위해 다루고 있는 핵심 문제이다. 실생활에서 다루는 그래프 데이터의 크기가 커져 감에 따라, 대용량의 그래프를 처리할 수 있는 그래프 동형 알고리즘의 필요성이 높아지고 있다. 그러나 현재 존재하는 그래프 동형 알고리즘들은 대용량의 그래프에 대해서 시간 혹은 공간 측면에서 한계를 보여준다. 응용 분야 중에서는 여러 개의 그래프들 중에서 하나의 쿼리 그래프와 동형인 그래프를 모두 찾는 문제, 즉 그래프 동형 쿼리 프로세싱을 종종 요구하기도 한다. 본 논문에서는 대용량의 실제 그래프 데이터에 대해서 그래프 동형 문제와 그래프 동형 쿼리 프로세싱 문제를 빠르게 푸는 알고리즘들을 제안한다. 첫 번째로, 본 논문에서는 그래프 동형 문제를 위한 빠르고 확장성 있는 알고리즘을 제안한다. 이를 위해 쌍별 색 개선(pairwise color refinement)과 효율적인 백트래킹으로 구성된 프레임워크를 소개한다. 이 프레임워크 내에서 세 가지 효율적인 테크닉을 사용한다. 실제 그래프 데이터에 대한 실험을 통해 본 알고리즘이 현존하는 가장 빠른 알고리즘들보다 평균 수천 배 빠름을 보였다. 두 번째로, 본 논문에서는 그래프 동형 쿼리 프로세싱을 위한 효율적인 알고리즘을 개발한다. 본 알고리즘은 차수열과 색-레이블 분포를 이용한 인덱스를 이용한다. 실제 그래프 데이터에 대한 실험을 통해 본 알고리즘이 현존하는 알고리즘들보다 인덱싱 시간에서는 항상 평균 수천 배 빠르고, 쿼리 처리 시간에서는 중\cdot대용량의 그래프들에 대해서 평균 수십 배 빠르게 동작하는 것을 보였다.1. Introduction 1 1.1. Background 1 1.2. Organization 3 2. Preliminaries 4 2.1. Notation 4 2.2. Problem Definitions 6 2.3. Related Work 7 3. Graph Isomorphism 9 3.1. Algorithm Overview 12 3.2. Pairwise Color Refinement and Binary Cell Mapping 13 3.3. Compressed Candidate Space 16 3.4. Backtracking and Partial Failing Sets 21 3.5. Performance Evaluation 31 3.5.1. Comparing with Existing Solutions 35 3.5.2. Effectiveness of Individual Techniques 39 3.5.3. Analysis with Varying Degrees of Similarity 42 3.5.4. Sensitivity Analysis 46 4. Graph Isomorphism Query Processing 48 4.1. Canonical Coloring 51 4.2. Index Construction 56 4.3. Query Processing 59 4.4. Performance Evaluation 63 4.4.1. Varying Number of Hops 67 4.4.2. Varying Number of Data Graphs 74 5. Conclusion 78 5.1. Summary 78 5.2. Future Directions 79 요약 95박

    Learning kinematic structure correspondences using multi-order similarities

    Get PDF
    We present a novel framework for finding the kinematic structure correspondences between two articulated objects in videos via hypergraph matching. In contrast to appearance and graph alignment based matching methods, which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos. Thus our method allows matching the structure of objects which have similar topologies or motions, or a combination of the two. Our main contributions are summarised as follows: (i)casting the kinematic structure correspondence problem into a hypergraph matching problem by incorporating multi-order similarities with normalising weights, (ii)introducing a structural topology similarity measure by aggregating topology constrained subgraph isomorphisms, (iii)measuring kinematic correlations between pairwise nodes, and (iv)proposing a combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold. We demonstrate the robustness and accuracy of our method through a number of experiments on synthetic and real data, showing that various other recent and state of the art methods are outperformed. Our method is not limited to a specific application nor sensor, and can be used as building block in applications such as action recognition, human motion retargeting to robots, and articulated object manipulation

    A survey of frequent subgraph mining algorithms

    Get PDF

    The Graph Pattern Matching Problem through Parameterized Matching

    Get PDF
    We propose a new approach to solve graph isomorphism using parameterized matching. Parameterized matching is a string matching problem where two strings parameterized-match if there exists a bijective function, on the symbols of the alphabet, that maps one of the strings into the other. Given that parameterized matching is defined for linear structures, we define the concept of graph linearization to represent the topology of a graph as a walk on it. Then, our approach to determine whether two graphs are isomorphic consists of determining whether there exists a walk in one of the graphs that parameterized-matches a linearization of the other graph. Our solution has two main steps: linearization and matching. We develop an efficient linearization algorithm, that generates short linearizations with an approximation guarantee, and develop a graph matching algorithm. We show that this solution also works for subgraph isomorphism, which is the problem of determining whether an input graph H is isomorphic to a subgraph of another input graph G. We evaluate our approach experimentally on graphs of different types and sizes, and compare to the performance of VF2, which is a prominent algorithm for graph isomorphism. Our empirical measurements show that graph linearization finds a matching graph faster than VF2 in many cases, especially in Miyazaki-constructed graphs which are known to be one of the hardest cases for graph isomorphism algorithms. We extend this approach to query attributed graphs. An attributed graph is a graph data structure, in which nodes and edges may have identifiers, types and other attributes. Attributed graphs are used in many application domains, for example to model social networks in which nodes represent people, photos, and postings and edges represent friendship, person-tagged-in-photo and mentioned-in-post relationships. Queries are used to extract information from such graphs. Several graph queries are expressed as graph pattern matching, which is the problem of finding all instances of pattern match query P in a larger attributed graph G. A pattern match query may specify both a graph structure and predicates on the attributes of the graph elements. Clearly, this problem is associated to subgraph isomorphism. Furthermore, we define a more general class of graph queries called generalized pattern queries on attributed multigraphs. The goal of this class is to find paths and subgraphs that satisfy query reachability and predicates. The query language is expressive: It allows (i) using regular expression operators (e.g., Kleene star and union); (ii) specifying structural predicates on graph nodes and edges; and (iii) using attribute predicates on nodes and edges. Pattern match queries, reachability queries, their combination, and even more queries can be expressed through generalized pattern queries. We use our approach to solve this new type of queries. The proposed technique has two phases. First, the query is linearized, i.e., represented as a graph walk that covers all nodes and edges. There are several linearizations for a given query; we derive heuristics to produce a good linearization that is short and places selective predicates early in the linearization. Second, we search for a bijective function that maps each element of the query to an element of the attributed multigraph that satisfies the reachability requirements and the predicates. Specifically, we develop an algorithm that matches the linearization by traversing the attributed graph in a manner similar to a breadth first traversal constrained by the linearization. We evaluate our solution experimentally using a real graph (the DBLP citation network) to assess its practicality and efficiency. Our results show that our techniques and optimizations are effective in querying attributed graphs, offering several factors of reduction in query response time when graph statistics are utilized.Resumen. En esta tesis se propone un nuevo enfoque de solución para resolver el problema de isomorfismo de grafos usando búsqueda parametrizada. La búsqueda parametrizada es un problema de búsqueda de cadenas de texto en el cual dos cadenas coinciden si existe una biyección que mapee los símbolos de una cadena en los símbolos de la otra. Dado que la búsqueda parametrizada está definida para estructuras lineales, se define el concepto de linearización de grafos para representar la topología de un grafo como un camino sobre este. Entonces, la solución para determinar si dos grafos son isomorfos consiste en determinar si existe un camino en uno de los grafos que haga coincidencia parametrizada con la linearización del otro grafo. La solución propuesta tiene dos pasos: linearización y búsqueda. Se presenta un algoritmo eficiente que genera linearizaciones aproximadamente óptimas en longitud, y un algoritmo de búsqueda. Se demuestra que esta solución también resuelve el problema de isomorfismo de subgrafos, en el cual se determina si un grafo H es isomorfo a un subgrafo de otro grafo G. Se evaluó experimentalmente la solución con grafos de diferentes tipos y tamaños. Se comparó su desempeño con el de VF2, el cual es un algoritmo competitivo de isomorfismo de grafos. Los resultados experimentales muestran que la solución propuesta es más eficiente que VF2 en varios casos, en especial en grafos Miyazaki, los cuales se caracterizan por constituir uno de los casos más difíciles para los algoritmos de isomorfismo de grafos. Este enfoque de solución se extiende para resolver consultas sobre grafos semánticos. Un grafo semántico es un grafo en el cual los nodes y arcos pueden tener identificadores, tipos y otros atributos. Estos grafos tienen aplicaciones importantes en diversas áreas, como por ejemplo para modelar redes sociales en las que los nodos representan personas, fotos y publicaciones y los arcos representan relaciones de amistad, etiquetado y mención. Se usan consultas para extraer información de estos grafos. Varias de estas consultas se expresan como búsqueda de patrones, la cual consiste en encontrar las coincidencias del grafo patrón P en un grafo semántico G. El grafo patrón especifica tanto la estructura de las coincidencias a encontrar, como los predicados sobre los atributos que deben cumplir los nodos y los arcos de las mismas. Claramente, este problema está asociado al isomorfismo de subgrafos. Además, se define un tipo de consultas más general sobre grafos semánticos. Estos nuevos patrones se denominan grafos patrón generalizados. El objetivo de estos es encontrar caminos y subgrafos que satisfagan ciertos requisitos semánticos, de estructura y de alcance. Estos patrones son expresivos, pues permiten (i) usar operadores de expresiones regulares (e.g., la estrella de Kleene y la unión); (ii) especificar predicados estructurales en los nodos y arcos; y (iii) evaluar predicados sobre los atributos de los nodos y arcos. Los patrones grafo tradicionales, las consultas de alcance, la combinación de estos y más se pueden representar a través de grafos patrón generalizados. Se usa el enfoque de solución propuesto para resolver los grafos patrón generalizados. La solución tiene dos fases. Primero, el patrón es linearizado, i.e., representado como un camino que incluye todos sus nodos y arcos. Hay muchas linearizaciones para un patrón dado; se proponen heurísticas para producir linearizaciones cortas que ubican los predicados selectivos al comienzo. Segundo, se busca una función biyectiva que mapee cada nodo en el patrón a un nodo en el grafo semántico que satisfaga los requisitos de alcance y los predicados. Específicamente, se propone un algoritmo de búsqueda que recorre el grafo semántico siguiendo una búsqueda en amplitud restringida por la linearización. La solución se evaluó experimentalmente usando un grafo semántico real (la red de citaciones DBLP) para evaluar su practicidad y eficiencia. Los resultados experimentales muestran que las técnicas y optimizaciones propuestas son efectivas en consultar grafos semánticos, ofreciendo un alto factor de reducción en el tiempo de ejecución cuando se utilizan las estadísticas del grafo semántico.Doctorad

    Analysis of Generative Chemistries

    Get PDF
    For the modelling of chemistry we use undirected, labelled graphs as explicit models of molecules and graph transformation rules for modelling generalised chemical reactions. This is used to define artificial chemistries on the level of individual bonds and atoms, where formal graph grammars implicitly represent large spaces of chemical compounds. We use a graph rewriting formalism, rooted in category theory, called the Double Pushout approach, which directly expresses the transition state of chemical reactions. Using concurrency theory for transformation rules, we define algorithms for the composition of rewrite rules in a chemically intuitive manner that enable automatic abstraction of the level of detail in chemical pathways. Based on this rule composition we define an algorithmic framework for generation of vast reaction networks for specific spaces of a given chemistry, while still maintaining the level of detail of the model down to the atomic level. The framework also allows for computation with graphs and graph grammars, which is utilised to model non-trivial chemical systems. The graph generation relies on graph isomorphism testing, and we review the general individualisation-refinement paradigm used in the state-of-the-art algorithms for graph canonicalisation, isomorphism testing, and automorphism discovery. We present a model for chemical pathways based on a generalisation of network flows from ordinary directed graphs to directed hypergraphs. The model allows for reasoning about the flow of individual molecules in general pathways, and the introduction of chemically motivated routing constraints. It further provides the foundation for defining specialised pathway motifs, which is illustrated by defining necessary topological constraints for both catalytic and autocatalytic pathways. We also prove that central types of pathway questions are NP-complete, even for restricted classes of reaction networks. The complete pathway model, including constraints for catalytic and autocatalytic pathways, is implemented using integer linear programming. This implementation is used in a tree search method to enumerate both optimal and near-optimal pathway solutions. The formal methods are applied to multiple chemical systems: the enzyme catalysed beta-lactamase reaction, variations of the glycolysis pathway, and the formose process. In each of these systems we use rule composition to abstract pathways and calculate traces for isotope labelled carbon atoms. The pathway model is used to automatically enumerate alternative non-oxidative glycolysis pathways, and enumerate thousands of candidates for autocatalytic pathways in the formose process

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    Solving hard subgraph problems in parallel

    Get PDF
    This thesis improves the state of the art in exact, practical algorithms for finding subgraphs. We study maximum clique, subgraph isomorphism, and maximum common subgraph problems. These are widely applicable: within computing science, subgraph problems arise in document clustering, computer vision, the design of communication protocols, model checking, compiler code generation, malware detection, cryptography, and robotics; beyond, applications occur in biochemistry, electrical engineering, mathematics, law enforcement, fraud detection, fault diagnosis, manufacturing, and sociology. We therefore consider both the ``pure'' forms of these problems, and variants with labels and other domain-specific constraints. Although subgraph-finding should theoretically be hard, the constraint-based search algorithms we discuss can easily solve real-world instances involving graphs with thousands of vertices, and millions of edges. We therefore ask: is it possible to generate ``really hard'' instances for these problems, and if so, what can we learn? By extending research into combinatorial phase transition phenomena, we develop a better understanding of branching heuristics, as well as highlighting a serious flaw in the design of graph database systems. This thesis also demonstrates how to exploit two of the kinds of parallelism offered by current computer hardware. Bit parallelism allows us to carry out operations on whole sets of vertices in a single instruction---this is largely routine. Thread parallelism, to make use of the multiple cores offered by all modern processors, is more complex. We suggest three desirable performance characteristics that we would like when introducing thread parallelism: lack of risk (parallel cannot be exponentially slower than sequential), scalability (adding more processing cores cannot make runtimes worse), and reproducibility (the same instance on the same hardware will take roughly the same time every time it is run). We then detail the difficulties in guaranteeing these characteristics when using modern algorithmic techniques. Besides ensuring that parallelism cannot make things worse, we also increase the likelihood of it making things better. We compare randomised work stealing to new tailored strategies, and perform experiments to identify the factors contributing to good speedups. We show that whilst load balancing is difficult, the primary factor influencing the results is the interaction between branching heuristics and parallelism. By using parallelism to explicitly offset the commitment made to weak early branching choices, we obtain parallel subgraph solvers which are substantially and consistently better than the best sequential algorithms

    Use of Automorphisms in Conauto-2.0

    Get PDF
    Se ha realizado un estudio de los fundamentos matemáticos de la teoría de grafos, la cual tiene varios problemas abiertos. Uno de estos problemas es el del isomorfismo de un subgrafo, del cual podemos sacar un caso particular de él, que es el problema del isomorfismo de grafos. El problema del isomorfismo de grafos tiene interés, tanto desde el punto de vista teórico, como práctico. Desde el punto de vista teórico es interesante porque existen variedad de problemas que son reducibles al del isomorfismo de grafos. Por lo tanto, encontrar un algoritmo que resuelva este problema en tiempo polinómico, resolvería indirectamente estos otro problemas también en tiempo polinómico. Desde el punto de vista práctico, resulta útil en muchos campos, desde el reconocimiento de formas a la química matemática, donde es imprescindible para catalogar adecuadamente los compuestos químicos. Partiendo del algoritmo conauto-1.02, se han añadido nuevas funcionalidad, conservando el enfoque original, para mejorar su rendimiento. Los más actuales y rápidos algoritmos para resolver el problema del isomorfismo de grafos están basados en etiquetado canónico. Sin embargo, normalmente es mucho más difícil encontrar un etiquetado canónico para un grafo, que calcular su grupo de automorfismo. Por lo tanto, un algoritmo que calcule el grupo de automorfismo de los grafos a comprobar dicho isomorfismo, e intente casarlos usando esta información, podría dar lugar a un algoritmo que sea más rápido que aquellos basados en etiquetado canónico. Con todo esto, hemos desarrollado un algoritmo, conauto-2.0, que usa este enfoque alternativo. Una característica en común de todos estos algoritmo (incluido conauto) es que están basados en la técnica de individualización-refinamiento. Los puntos clave en la individualización son los criterios para seleccionar la celda de la cual se individualizará un vértice. En conauto-2.0 la selección de la celda para individualizar es parcialmente dinámica, e intenta reducir el número de puntos de 'backtrack' tanto como le sea posible. Esto es una contribución parcial de este trabajo, pero que ayuda a mejorar el rendimiento. Las más importantes contribuciones de este trabajo son los teoeremas basados en el concepto de sub-partición. El primer teorema permite la detección prematura de automorfismos sin la necesidad de generar la correspondiente secuencia de particiones completa, lo cual acelera el algoritmo en muchos casos. El segundo teorema ayuda a podar secciones completas del árbol de búsqueda para los casos de búsqueda de automorfismos y la búsqueda de secuencia de particiones compatibles, usando 'backjumping'. Estas características de conauto-2.0 lo hacen superar al resto de algoritmos con las familias de grafos basados en componentes, y además su campo de aplicación no está solamente limitado a estas familias de grafos. Se realizaron pruebas de isomorfismo de grafos y de cálculo de grupo de automorfismo para nuestro algoritmo, contra algunos de los actuales y más conocidos algoritmos de resolución de dichos problemas. Los resultados mostraron que conauto-2.0 tiene el comportamiento más regular, además de ser el algoritmo más rápido para varias familias de grafos
    corecore