2,682 research outputs found
Greedy Graph Colouring is a Misleading Heuristic
State of the art maximum clique algorithms use a greedy graph colouring as a
bound. We show that greedy graph colouring can be misleading, which has
implications for parallel branch and bound
Multi-threading a state-of-the-art maximum clique algorithm
We present a threaded parallel adaptation of a state-of-the-art maximum clique
algorithm for dense, computationally challenging graphs. We show that near-linear speedups
are achievable in practice and that superlinear speedups are common. We include results for
several previously unsolved benchmark problems
Asynchronous parallel branch and bound and anomalies
The parallel execution of branch and bound algorithms can result in seemingly unreasonable speedups or slowdowns. Almost never the speedup is equal to the increase in computing power. For synchronous parallel branch and bound, these effects have been studiedd extensively. For asynchronous parallelizations, only little is known.
In this paper, we derive sufficient conditions to guarantee that an asynchronous parallel
branch and bound algorithm (with elimination by lower bound tests and dominance) will be
at least as fast as its sequential counterpart. The technique used for obtaining the results seems to be more generally applicable.
The essential observations are that, under certain conditions, the parallel algorithm will
always work on at least one node, that is branched from by the sequential algorithm, and
that the parallel algorithm, after elimination of all such nodes, is able to conclude that
the optimal solution has been found.
Finally, some of the theoretical results are brought into connection with a few practical
experiments
On parallel Branch and Bound frameworks for Global Optimization
Branch and Bound (B&B) algorithms are known to exhibit an irregularity of the search tree. Therefore, developing a parallel approach for this kind of algorithms is a challenge. The efficiency of a B&B algorithm depends on the chosen Branching, Bounding, Selection, Rejection, and Termination rules. The question we investigate is how the chosen platform consisting of programming language, used libraries, or skeletons influences programming effort and algorithm performance. Selection rule and data management structures are usually hidden to programmers for frameworks with a high level of abstraction, as well as the load balancing strategy, when the algorithm is run in parallel. We investigate the question by implementing a multidimensional Global Optimization B&B algorithm with the help of three frameworks with a different level of abstraction (from more to less): Bobpp, Threading Building Blocks (TBB), and a customized Pthread implementation. The following has been found. The Bobpp implementation is easy to code, but exhibits the poorest scalability. On the contrast, the TBB and Pthread implementations scale almost linearly on the used platform. The TBB approach shows a slightly better productivity
Solving large-scale traveling salesman problems with parallel Branch-and-Cut
We introduce the implementation of a parallel Branch-and-Cut algorithm to solve large-scale traveling salesman problems. Rather than using the well-known models of homogeneous distribution and simple Master/Slave communication, we present a more sophisticated distribution that takes the advantage of several independent features of a Branch-and-Cut code. Computational results are reported for several instances of the TSPLIB
Replicable parallel branch and bound search
Combinatorial branch and bound searches are a common technique for solving global optimisation and decision problems. Their performance often depends on good search order heuristics, refined over decades of algorithms research. Parallel search necessarily deviates from the sequential search order, sometimes dramatically and unpredictably, e.g. by distributing work at random. This can disrupt effective search order heuristics and lead to unexpected and highly variable parallel performance. The variability makes it hard to reason about the parallel performance of combinatorial searches.
This paper presents a generic parallel branch and bound skeleton, implemented in Haskell, with replicable parallel performance. The skeleton aims to preserve the search order heuristic by distributing work in an ordered fashion, closely following the sequential search order. We demonstrate the generality of the approach by applying the skeleton to 40 instances of three combinatorial problems: Maximum Clique, 0/1 Knapsack and Travelling Salesperson. The overheads of our Haskell skeleton are reasonable: giving slowdown factors of between 1.9 and 6.2 compared with a class-leading, dedicated, and highly optimised C++ Maximum Clique solver. We demonstrate scaling up to 200 cores of a Beowulf cluster, achieving speedups of 100x for several Maximum Clique instances. We demonstrate low variance of parallel performance across all instances of the three combinatorial problems and at all scales up to 200 cores, with median Relative Standard Deviation (RSD) below 2%. Parallel solvers that do not follow the sequential search order exhibit far higher variance, with median RSD exceeding 85% for Knapsack
Partially ordered distributed computations on asynchronous point-to-point networks
Asynchronous executions of a distributed algorithm differ from each other due
to the nondeterminism in the order in which the messages exchanged are handled.
In many situations of interest, the asynchronous executions induced by
restricting nondeterminism are more efficient, in an application-specific
sense, than the others. In this work, we define partially ordered executions of
a distributed algorithm as the executions satisfying some restricted orders of
their actions in two different frameworks, those of the so-called event- and
pulse-driven computations. The aim of these restrictions is to characterize
asynchronous executions that are likely to be more efficient for some important
classes of applications. Also, an asynchronous algorithm that ensures the
occurrence of partially ordered executions is given for each case. Two of the
applications that we believe may benefit from the restricted nondeterminism are
backtrack search, in the event-driven case, and iterative algorithms for
systems of linear equations, in the pulse-driven case
Towards an abstract parallel branch and bound machine
Many (parallel) branch and bound algorithms look very different from each other at first
glance. They exploit, however, the same underlying computational model. This phenomenon
can be used to define branch and bound algorithms in terms of a set of basic rules that are applied in a specific (predefined) order.
In the sequential case, the specification of Mitten's rules turns out to be sufficient for
the development of branch and bound algorithms. In the parallel case, the situation is a
bit more complicated. We have to consider extra parameters such as work distribution and
knowledge sharing. Here, the implementation of parallel branch and bound algorithms can be
seen as a tuning of the parameters combined with the specification of Mitten's rules.
These observations lead to generic systems, where the user provides the specifications of
the problem to be solved, and the system generates a branch and bound algorithm running on
a specific architecture. We will discuss some proposals that appeared in the literature.
Next, we raise the question whether the proposed models are flexible enough. We analyze
the design decisions to be taken when implementing a parallel branch and bound algorithm.
It results in a classification model, which is validated by checking whether it captures
existing branch and bound implementations.
Finally, we return to the issue of flexibility of existing systems, and propose to add an
abstract machine model to the generic framework. The model defines a virtual parallel
branch and bound machine, within which the design decisions can be expressed in terms of
the abstract machine. We will outline some ideas on which the machine may be based, and
present directions of future work
The Maximum Common Subgraph Problem: A Parallel and Multi-Engine Approach
The maximum common subgraph of two graphs is the largest possible common subgraph,
i.e., the common subgraph with as many vertices as possible. Even if this problem is very challenging,
as it has been long proven NP-hard, its countless practical applications still motivates searching
for exact solutions. This work discusses the possibility to extend an existing, very effective
branch-and-bound procedure on parallel multi-core and many-core architectures. We analyze
a parallel multi-core implementation that exploits a divide-and-conquer approach based on a
thread pool, which does not deteriorate the original algorithmic efficiency and it minimizes
data structure repetitions. We also extend the original algorithm to parallel many-core GPU
architectures adopting the CUDA programming framework, and we show how to handle the heavily
workload-unbalance and the massive data dependency. Then, we suggest new heuristics to reorder
the adjacency matrix, to deal with “dead-ends”, and to randomize the search with automatic restarts.
These heuristics can achieve significant speed-ups on specific instances, even if they may not be competitive with the original strategy on average. Finally, we propose a portfolio approach, which integrates all the different local search algorithms as component tools; such portfolio, rather than choosing the best tool for a given instance up-front, takes the decision on-line. The proposed approach drastically limits memory bandwidth constraints and avoids other typical portfolio fragility as CPU and GPU versions often show a complementary efficiency and run on separated platforms. Experimental results support the claims and motivate further research to better exploit GPUs in embedded task-intensive and multi-engine parallel applications
- …