2,682 research outputs found

    Greedy Graph Colouring is a Misleading Heuristic

    Full text link
    State of the art maximum clique algorithms use a greedy graph colouring as a bound. We show that greedy graph colouring can be misleading, which has implications for parallel branch and bound

    Multi-threading a state-of-the-art maximum clique algorithm

    Get PDF
    We present a threaded parallel adaptation of a state-of-the-art maximum clique algorithm for dense, computationally challenging graphs. We show that near-linear speedups are achievable in practice and that superlinear speedups are common. We include results for several previously unsolved benchmark problems

    Asynchronous parallel branch and bound and anomalies

    Get PDF
    The parallel execution of branch and bound algorithms can result in seemingly unreasonable speedups or slowdowns. Almost never the speedup is equal to the increase in computing power. For synchronous parallel branch and bound, these effects have been studiedd extensively. For asynchronous parallelizations, only little is known. In this paper, we derive sufficient conditions to guarantee that an asynchronous parallel branch and bound algorithm (with elimination by lower bound tests and dominance) will be at least as fast as its sequential counterpart. The technique used for obtaining the results seems to be more generally applicable. The essential observations are that, under certain conditions, the parallel algorithm will always work on at least one node, that is branched from by the sequential algorithm, and that the parallel algorithm, after elimination of all such nodes, is able to conclude that the optimal solution has been found. Finally, some of the theoretical results are brought into connection with a few practical experiments

    On parallel Branch and Bound frameworks for Global Optimization

    Get PDF
    Branch and Bound (B&B) algorithms are known to exhibit an irregularity of the search tree. Therefore, developing a parallel approach for this kind of algorithms is a challenge. The efficiency of a B&B algorithm depends on the chosen Branching, Bounding, Selection, Rejection, and Termination rules. The question we investigate is how the chosen platform consisting of programming language, used libraries, or skeletons influences programming effort and algorithm performance. Selection rule and data management structures are usually hidden to programmers for frameworks with a high level of abstraction, as well as the load balancing strategy, when the algorithm is run in parallel. We investigate the question by implementing a multidimensional Global Optimization B&B algorithm with the help of three frameworks with a different level of abstraction (from more to less): Bobpp, Threading Building Blocks (TBB), and a customized Pthread implementation. The following has been found. The Bobpp implementation is easy to code, but exhibits the poorest scalability. On the contrast, the TBB and Pthread implementations scale almost linearly on the used platform. The TBB approach shows a slightly better productivity

    Solving large-scale traveling salesman problems with parallel Branch-and-Cut

    Get PDF
    We introduce the implementation of a parallel Branch-and-Cut algorithm to solve large-scale traveling salesman problems. Rather than using the well-known models of homogeneous distribution and simple Master/Slave communication, we present a more sophisticated distribution that takes the advantage of several independent features of a Branch-and-Cut code. Computational results are reported for several instances of the TSPLIB

    Replicable parallel branch and bound search

    Get PDF
    Combinatorial branch and bound searches are a common technique for solving global optimisation and decision problems. Their performance often depends on good search order heuristics, refined over decades of algorithms research. Parallel search necessarily deviates from the sequential search order, sometimes dramatically and unpredictably, e.g. by distributing work at random. This can disrupt effective search order heuristics and lead to unexpected and highly variable parallel performance. The variability makes it hard to reason about the parallel performance of combinatorial searches. This paper presents a generic parallel branch and bound skeleton, implemented in Haskell, with replicable parallel performance. The skeleton aims to preserve the search order heuristic by distributing work in an ordered fashion, closely following the sequential search order. We demonstrate the generality of the approach by applying the skeleton to 40 instances of three combinatorial problems: Maximum Clique, 0/1 Knapsack and Travelling Salesperson. The overheads of our Haskell skeleton are reasonable: giving slowdown factors of between 1.9 and 6.2 compared with a class-leading, dedicated, and highly optimised C++ Maximum Clique solver. We demonstrate scaling up to 200 cores of a Beowulf cluster, achieving speedups of 100x for several Maximum Clique instances. We demonstrate low variance of parallel performance across all instances of the three combinatorial problems and at all scales up to 200 cores, with median Relative Standard Deviation (RSD) below 2%. Parallel solvers that do not follow the sequential search order exhibit far higher variance, with median RSD exceeding 85% for Knapsack

    Partially ordered distributed computations on asynchronous point-to-point networks

    Full text link
    Asynchronous executions of a distributed algorithm differ from each other due to the nondeterminism in the order in which the messages exchanged are handled. In many situations of interest, the asynchronous executions induced by restricting nondeterminism are more efficient, in an application-specific sense, than the others. In this work, we define partially ordered executions of a distributed algorithm as the executions satisfying some restricted orders of their actions in two different frameworks, those of the so-called event- and pulse-driven computations. The aim of these restrictions is to characterize asynchronous executions that are likely to be more efficient for some important classes of applications. Also, an asynchronous algorithm that ensures the occurrence of partially ordered executions is given for each case. Two of the applications that we believe may benefit from the restricted nondeterminism are backtrack search, in the event-driven case, and iterative algorithms for systems of linear equations, in the pulse-driven case

    Towards an abstract parallel branch and bound machine

    Get PDF
    Many (parallel) branch and bound algorithms look very different from each other at first glance. They exploit, however, the same underlying computational model. This phenomenon can be used to define branch and bound algorithms in terms of a set of basic rules that are applied in a specific (predefined) order. In the sequential case, the specification of Mitten's rules turns out to be sufficient for the development of branch and bound algorithms. In the parallel case, the situation is a bit more complicated. We have to consider extra parameters such as work distribution and knowledge sharing. Here, the implementation of parallel branch and bound algorithms can be seen as a tuning of the parameters combined with the specification of Mitten's rules. These observations lead to generic systems, where the user provides the specifications of the problem to be solved, and the system generates a branch and bound algorithm running on a specific architecture. We will discuss some proposals that appeared in the literature. Next, we raise the question whether the proposed models are flexible enough. We analyze the design decisions to be taken when implementing a parallel branch and bound algorithm. It results in a classification model, which is validated by checking whether it captures existing branch and bound implementations. Finally, we return to the issue of flexibility of existing systems, and propose to add an abstract machine model to the generic framework. The model defines a virtual parallel branch and bound machine, within which the design decisions can be expressed in terms of the abstract machine. We will outline some ideas on which the machine may be based, and present directions of future work

    The Maximum Common Subgraph Problem: A Parallel and Multi-Engine Approach

    Get PDF
    The maximum common subgraph of two graphs is the largest possible common subgraph, i.e., the common subgraph with as many vertices as possible. Even if this problem is very challenging, as it has been long proven NP-hard, its countless practical applications still motivates searching for exact solutions. This work discusses the possibility to extend an existing, very effective branch-and-bound procedure on parallel multi-core and many-core architectures. We analyze a parallel multi-core implementation that exploits a divide-and-conquer approach based on a thread pool, which does not deteriorate the original algorithmic efficiency and it minimizes data structure repetitions. We also extend the original algorithm to parallel many-core GPU architectures adopting the CUDA programming framework, and we show how to handle the heavily workload-unbalance and the massive data dependency. Then, we suggest new heuristics to reorder the adjacency matrix, to deal with “dead-ends”, and to randomize the search with automatic restarts. These heuristics can achieve significant speed-ups on specific instances, even if they may not be competitive with the original strategy on average. Finally, we propose a portfolio approach, which integrates all the different local search algorithms as component tools; such portfolio, rather than choosing the best tool for a given instance up-front, takes the decision on-line. The proposed approach drastically limits memory bandwidth constraints and avoids other typical portfolio fragility as CPU and GPU versions often show a complementary efficiency and run on separated platforms. Experimental results support the claims and motivate further research to better exploit GPUs in embedded task-intensive and multi-engine parallel applications
    corecore