12 research outputs found

    Experimental Evaluation of Subgraph Isomorphism Solvers

    Get PDF
    International audienceSubgraph Isomorphism (SI) is an NP-complete problem which is at the heart of many structural pattern recognition tasks as it involves finding a copy of a pattern graph into a target graph. In the pattern recognition community, the most well-known SI solvers are VF2, VF3, and RI. SI is also widely studied in the constraint programming community, and many constraint-based SI solvers have been proposed since Ullman, such as LAD and Glasgow, for example. All these SI solvers can solve very quickly some large SI instances, that involve graphs with thousands of nodes. However, McCreesh et al. have recently shown how to randomly generate SI instances the hardness of which can be controlled and predicted, and they have built small instances which are computationally challenging for all solvers. They have also shown that some small instances, which are predicted to be easy and are easily solved by constraint-based solvers, appear to be challenging for VF2 and VF3. In this paper, we widen this study by considering a large test suite coming from eight benchmarks. We show that, as expected for an NP-complete problem, the solving time of an instance does not depend on its size, and that some small instances coming from real applications are not solved by any of the considered solvers. We also show that, if RI and VF3 can solve very quickly a large number of easy instances, for which Glasgow or LAD need more time, they fail at solving some other instances that are quickly solved by Glasgow or LAD, and they are clearly outperformed by Glasgow on hard instances. Finally, we show that we can easily combine solvers to take benefit of their complementarity

    A FRAMEWORK FOR THE REPRESENTATION OF TWO VERSIONS OF A 3D CITY MODEL IN 4D SPACE

    Get PDF
    3D city models are being increasingly adopted by organisations in order to serve application needs related to urban areas. In order to fulfil the different requirements of various applications, the concept of Level of Detail (LoD) has been incorporated in 3D city models specifications, such as CityGML. Therefore, datasets of different LoDs are being created for the same areas by several organisations for their own use cases. Meanwhile, as time progresses newer versions of existing 3D city models are being created by vendors. Nevertheless, the existing mechanisms for representating multi-LoD data has not been adopted by the users and there has been little effort on the implementation of a mechanism to store multiple revisions of a city model. This results in redundancy of information and the existence of multiple datasets inconsistent with each other. Alternatively, a representation of time or scale as additional dimensions to the three spatial ones has been proposed as a better way to store multiple versions of datasets while retaining information related to the corresponding features between datasets. In this paper, we propose a conceptual framework with initial considerations for the implementation of a 4D representation of two states of a 3D city model. This framework defines both the data structure of such an approach, as well as the methodology according to which two existing 3D city models can be compared, associated and stored with their correspondences in 4D. The methodology is defined as six individual steps that have to be undertaken, each with its own individual requirements and goals that have to be challenged. We, also, provide some examples and considerations for the way those steps can be implemented

    Sequential and parallel solution-biased search for subgraph algorithms

    Get PDF
    Funding: This work was supported by the Engineering and Physical Sciences Research Council (grant numbers EP/P026842/1, EP/M508056/1, and EP/N007565).The current state of the art in subgraph isomorphism solving involves using degree as a value-ordering heuristic to direct backtracking search. Such a search makes a heavy commitment to the first branching choice, which is often incorrect. To mitigate this, we introduce and evaluate a new approach, which we call “solution-biased search”. By combining a slightly-random value-ordering heuristic, rapid restarts, and nogood recording, we design an algorithm which instead uses degree to direct the proportion of search effort spent in different subproblems. This increases performance by two orders of magnitude on satisfiable instances, whilst not affecting performance on unsatisfiable instances. This algorithm can also be parallelised in a very simple but effective way: across both satisfiable and unsatisfiable instances, we get a further speedup of over thirty from thirty-six cores, and over one hundred from ten distributed-memory hosts. Finally, we show that solution-biased search is also suitable for optimisation problems, by using it to improve two maximum common induced subgraph algorithms.Postprin

    Exact and heuristic algorithms for network alignment using graph edit distance models

    Get PDF
    In the thesis we aim to study theoretical and practical questions of applying the graph edit distance (GED) model to the protein-protein interaction network alignment problem using topological information of graphs only. In Part II we explore some theoretical aspects of the model formulated as three different problems; Part III presents three heuristics for the PPI network alignment problem based on a GED model that counts the number of deleted and inserted edges.In dieser Arbeit werden theoretische und praktische Aspekte der Anwendung des GED(Graph Edit Distance)-Modells auf PPI (Protein-Protein-Interaktions)-Netzwerke untersucht. Hierbei werden werden ausschließlich topologische Informationen von Graphen verwendet. In zweiten Teil werden einige theoretische Eigenschaften des Modells untersucht, formuliert als drei verschiedene Problemstellungen. Im dritten Teil werden drei Heuristiken zur approximativen Lösung des PPI-Netzwerk-Alignmentproblems präsentiert, basierend auf einem GED-Modell, dass die Anzahl gelöschter und neu eingefügter Kanten auswertet

    Exact and heuristic algorithms for network alignment using graph edit distance models

    Get PDF
    In the thesis we aim to study theoretical and practical questions of applying the graph edit distance (GED) model to the protein-protein interaction network alignment problem using topological information of graphs only. In Part II we explore some theoretical aspects of the model formulated as three different problems; Part III presents three heuristics for the PPI network alignment problem based on a GED model that counts the number of deleted and inserted edges.In dieser Arbeit werden theoretische und praktische Aspekte der Anwendung des GED(Graph Edit Distance)-Modells auf PPI (Protein-Protein-Interaktions)-Netzwerke untersucht. Hierbei werden werden ausschließlich topologische Informationen von Graphen verwendet. In zweiten Teil werden einige theoretische Eigenschaften des Modells untersucht, formuliert als drei verschiedene Problemstellungen. Im dritten Teil werden drei Heuristiken zur approximativen Lösung des PPI-Netzwerk-Alignmentproblems präsentiert, basierend auf einem GED-Modell, dass die Anzahl gelöschter und neu eingefügter Kanten auswertet

    Computational Strategies for Object Recognition

    Get PDF
    This article reviews the available methods forautomated identification of objects in digital images. The techniques are classified into groups according to the nature of the computational strategy used. Four classes are proposed: (1) the s~mplest strategies, which work on data appropriate for feature vector classification, (2) methods that match models to symbolic data structures for situations involving reliable data and complex models, (3) approaches that fit models to the photometry and are appropriate for noisy data and simple models, and (4) combinations of these strategies, which must be adopted in complex situations Representative examples of various methods are summarized, and the classes of strategies are evaluated with respect to their appropriateness for particular applications

    Polynomial Algorithms for Subisomorphism of nD Open Combinatorial Maps

    Get PDF
    International audienceCombinatorial maps describe the subdivision of objects in cells, and incidence and adjacency relations between cells, and they are widely used to model 2D and 3D images. However, there is no algorithm for comparing combinatorial maps, which is an important issue for image processing and analysis. In this paper, we address two basic comparison problems, i.e., map isomorphism, which involves deciding if two maps are equivalent, and submap isomorphism, which involves deciding if a copy of a pattern map may be found in a target map. We formally define these two problems for nD open combinatorial maps, we give polynomial time algorithms for solving them, and we illustrate their interest and feasibility for searching patterns in 2D and 3D images, as any child would aim to do when he searches Wally in Martin Handford's books

    Solving hard subgraph problems in parallel

    Get PDF
    This thesis improves the state of the art in exact, practical algorithms for finding subgraphs. We study maximum clique, subgraph isomorphism, and maximum common subgraph problems. These are widely applicable: within computing science, subgraph problems arise in document clustering, computer vision, the design of communication protocols, model checking, compiler code generation, malware detection, cryptography, and robotics; beyond, applications occur in biochemistry, electrical engineering, mathematics, law enforcement, fraud detection, fault diagnosis, manufacturing, and sociology. We therefore consider both the ``pure'' forms of these problems, and variants with labels and other domain-specific constraints. Although subgraph-finding should theoretically be hard, the constraint-based search algorithms we discuss can easily solve real-world instances involving graphs with thousands of vertices, and millions of edges. We therefore ask: is it possible to generate ``really hard'' instances for these problems, and if so, what can we learn? By extending research into combinatorial phase transition phenomena, we develop a better understanding of branching heuristics, as well as highlighting a serious flaw in the design of graph database systems. This thesis also demonstrates how to exploit two of the kinds of parallelism offered by current computer hardware. Bit parallelism allows us to carry out operations on whole sets of vertices in a single instruction---this is largely routine. Thread parallelism, to make use of the multiple cores offered by all modern processors, is more complex. We suggest three desirable performance characteristics that we would like when introducing thread parallelism: lack of risk (parallel cannot be exponentially slower than sequential), scalability (adding more processing cores cannot make runtimes worse), and reproducibility (the same instance on the same hardware will take roughly the same time every time it is run). We then detail the difficulties in guaranteeing these characteristics when using modern algorithmic techniques. Besides ensuring that parallelism cannot make things worse, we also increase the likelihood of it making things better. We compare randomised work stealing to new tailored strategies, and perform experiments to identify the factors contributing to good speedups. We show that whilst load balancing is difficult, the primary factor influencing the results is the interaction between branching heuristics and parallelism. By using parallelism to explicitly offset the commitment made to weak early branching choices, we obtain parallel subgraph solvers which are substantially and consistently better than the best sequential algorithms

    Partitioning algorithms for induced subgraph problems

    Get PDF
    This dissertation introduces the MCSPLIT family of algorithms for two closely-related NP-hard problems that involve finding a large induced subgraph contained by each of two input graphs: the induced subgraph isomorphism problem and the maximum common induced subgraph problem. The MCSPLIT algorithms resemble forward-checking constrant programming algorithms, but use problem-specific data structures that allow multiple, identical domains to be stored without duplication. These data structures enable fast, simple constraint propagation algorithms and very fast calculation of upper bounds. Versions of these algorithms for both sparse and dense graphs are described and implemented. The resulting algorithms are over an order of magnitude faster than the best existing algorithm for maximum common induced subgraph on unlabelled graphs, and outperform the state of the art on several classes of induced subgraph isomorphism instances. A further advantage of the MCSPLIT data structures is that variables and values are treated identically; this allows us to choose to branch on variables representing vertices of either input graph with no overhead. An extensive set of experiments shows that such two-sided branching can be particularly beneficial if the two input graphs have very different orders or densities. Finally, we turn from subgraphs to supergraphs, tackling the problem of finding a small graph that contains every member of a given family of graphs as an induced subgraph. Exact and heuristic techniques are developed for this problem, in each case using a MCSPLIT algorithm as a subroutine. These algorithms allow us to add new terms to two entries of the On-Line Encyclopedia of Integer Sequences
    corecore