72,490 research outputs found
Parallel branch and bound on an MIMD system
In this paper we give a classification of parallel branch and bound algorithms and
develop a class of asynchronous branch and bound algorithms for execution on an MIMD system.
We develop sufficient conditions to prevent the anomalies that can occur due to the
parallelism, the asynchronicity or the nondeter- minism, from degrading the performance of
the algorithm. Such conditions were known already for the synchronous case. It turns out that these conditions are sufficient for asynchronous algorithms as well. We also investigate the consequences of nonhomogeneous processing elements in a parallel computer system.
We introduce the notions of perfect parallel time and achieved efficiency to empirically
measure the effects of parallelism, because the traditional notions of speedup and efficiency are not capable of fully characterizing the actual execution of an asyn-chronous parallel algorithm.
Finally we present some computational results obtained for the symmetric traveling
salesman problem
A simulation tool for the performance evaluation of parallel branch and bound algorithms
Parallel computation offers a challenging opportunity to speed up the time consuming
enumerative procedures that are necessary to solve hard combinatorial problems.
Theoretical analysis of such a parallel branch and bound algorithm is very hard and
empirical analysis is not straightforward because the performance of a parallel algorithm
cannot be evaluated simply by executing the algorithm on a few parallel systems. Among the
difficulties encountered are the noise produced by other users on the system, the limited
variation in parallelism (the number of processors in the system is strictly bounded) and
the waste of resources involved: most of the time, the outcomes of all computations are
already known and the only issue of interest is when these outcomes are produced.
We will describe a way to simulate the execution of parallel branch and bound algorithms
on arbitrary parallel systems in such a way that the memory and cpu requirements are very
reasonable. The use of simulation has only minor consequences for the formulation of the
algorithm
Parallel branch and bound and anomalies
In this paper we present a classification of parallel branch and bound algorithms and
investigate the anomalies which can occur during the execution of such algorithms. We develop sufficient conditions to prevent deceleration anomalies from degrading the performance. Such conditions were already known for some synchronous cases. It turns out that these conditions can be generalized to arbitrary cases. Finally we develop necessary conditions for acceleration anomalies to improve upon the performance
Experiments with parallel algorithms for combinatorial problems
In the last decade many models for parallel computation have been proposed and many
parallel algorithms have been developed. However, few of these models have been realized
and most of these algorithms are supposed to run on idealized, unrealistic parallel machines.
The parallel machines constructed so far all use a simple model of parallel computation.
Therefore, not every existing parallel machine is equally well suited for each type of
algorithm. The adaptation of a certain algorithm to a specific parallel archi- tecture may
severely increase the complexity of the algorithm or severely obscure its essence.
Little is known about the performance of some standard combinatorial algorithms on
existing parallel machines. In this paper we present computational results concerning the
solution of knapsack, shortest paths and change-making problems by branch and bound,
dynamic programming, and divide and conquer algorithms on the ICL-DAP (an SIMD computer),
the Manchester dataflow machine and the CDC-CYBER-205 (a pipeline computer)
Replicable parallel branch and bound search
Combinatorial branch and bound searches are a common technique for solving global optimisation and decision problems. Their performance often depends on good search order heuristics, refined over decades of algorithms research. Parallel search necessarily deviates from the sequential search order, sometimes dramatically and unpredictably, e.g. by distributing work at random. This can disrupt effective search order heuristics and lead to unexpected and highly variable parallel performance. The variability makes it hard to reason about the parallel performance of combinatorial searches.
This paper presents a generic parallel branch and bound skeleton, implemented in Haskell, with replicable parallel performance. The skeleton aims to preserve the search order heuristic by distributing work in an ordered fashion, closely following the sequential search order. We demonstrate the generality of the approach by applying the skeleton to 40 instances of three combinatorial problems: Maximum Clique, 0/1 Knapsack and Travelling Salesperson. The overheads of our Haskell skeleton are reasonable: giving slowdown factors of between 1.9 and 6.2 compared with a class-leading, dedicated, and highly optimised C++ Maximum Clique solver. We demonstrate scaling up to 200 cores of a Beowulf cluster, achieving speedups of 100x for several Maximum Clique instances. We demonstrate low variance of parallel performance across all instances of the three combinatorial problems and at all scales up to 200 cores, with median Relative Standard Deviation (RSD) below 2%. Parallel solvers that do not follow the sequential search order exhibit far higher variance, with median RSD exceeding 85% for Knapsack
A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem
Branch-and-Bound (B&B) algorithms are time intensive tree-based exploration
methods for solving to optimality combinatorial optimization problems. In this
paper, we investigate the use of GPU computing as a major complementary way to
speed up those methods. The focus is put on the bounding mechanism of B&B
algorithms, which is the most time consuming part of their exploration process.
We propose a parallel B&B algorithm based on a GPU-accelerated bounding model.
The proposed approach concentrate on optimizing data access management to
further improve the performance of the bounding mechanism which uses large and
intermediate data sets that do not completely fit in GPU memory. Extensive
experiments of the contribution have been carried out on well known FSP
benchmarks using an Nvidia Tesla C2050 GPU card. We compared the obtained
performances to a single and a multithreaded CPU-based execution. Accelerations
up to x100 are achieved for large problem instances
On parallel Branch and Bound frameworks for Global Optimization
Branch and Bound (B&B) algorithms are known to exhibit an irregularity of the search tree. Therefore, developing a parallel approach for this kind of algorithms is a challenge. The efficiency of a B&B algorithm depends on the chosen Branching, Bounding, Selection, Rejection, and Termination rules. The question we investigate is how the chosen platform consisting of programming language, used libraries, or skeletons influences programming effort and algorithm performance. Selection rule and data management structures are usually hidden to programmers for frameworks with a high level of abstraction, as well as the load balancing strategy, when the algorithm is run in parallel. We investigate the question by implementing a multidimensional Global Optimization B&B algorithm with the help of three frameworks with a different level of abstraction (from more to less): Bobpp, Threading Building Blocks (TBB), and a customized Pthread implementation. The following has been found. The Bobpp implementation is easy to code, but exhibits the poorest scalability. On the contrast, the TBB and Pthread implementations scale almost linearly on the used platform. The TBB approach shows a slightly better productivity
Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms
In the field of multi-objective optimization algorithms, multi-objective
Bayesian Global Optimization (MOBGO) is an important branch, in addition to
evolutionary multi-objective optimization algorithms (EMOAs). MOBGO utilizes
Gaussian Process models learned from previous objective function evaluations to
decide the next evaluation site by maximizing or minimizing an infill
criterion. A common criterion in MOBGO is the Expected Hypervolume Improvement
(EHVI), which shows a good performance on a wide range of problems, with
respect to exploration and exploitation. However, so far it has been a
challenge to calculate exact EHVI values efficiently. In this paper, an
efficient algorithm for the computation of the exact EHVI for a generic case is
proposed. This efficient algorithm is based on partitioning the integration
volume into a set of axis-parallel slices. Theoretically, the upper bound time
complexities are improved from previously and , for two- and
three-objective problems respectively, to , which is
asymptotically optimal. This article generalizes the scheme in higher
dimensional case by utilizing a new hyperbox decomposition technique, which was
proposed by D{\"a}chert et al, EJOR, 2017. It also utilizes a generalization of
the multilayered integration scheme that scales linearly in the number of
hyperboxes of the decomposition. The speed comparison shows that the proposed
algorithm in this paper significantly reduces computation time. Finally, this
decomposition technique is applied in the calculation of the Probability of
Improvement (PoI)
- …