53 research outputs found
Embarrassingly Parallel Search
International audienceWe propose the Embarrassingly Parallel Search, a simple and efficient method for solving constraint programming problems in parallel. We split the initial problem into a huge number of independent subproblems and solve them with available workers (i.e., cores of machines). The decomposition into subproblems is computed by selecting a subset of variables and by enumerating the combinations of values of these variables that are not detected inconsistent by the propagation mechanism of a CP Solver. The experiments on satisfaction problems and on optimization problems suggest that generating between thirty and one hundred subproblems per worker leads to a good scalability. We show that our method is quite competitive with the work stealing approach and able to solve some classical problems at the maximum capacity of the multi-core machines. Thanks to it, a user can parallelize the resolution of its problem without modifying the solver or writing any parallel source code and can easily replay the resolution of a problem
Improvement of the Embarrassingly Parallel Search for Data Centers
International audienceWe propose an adaptation of the Embarrassingly Parallel Search (EPS) method for data centers. EPS is a simple but efficient method for parallel solving of CSPs. EPS decomposes the problem in many distinct subproblems which are then solved independently by workers. EPS performed well on multi-cores machines (40), but some issues arise when using more cores in a datacenter. Here, we identify the decomposition as the cause of the degradation and propose a parallel decomposition to address this issue. Thanks to it, EPS gives almost linear speedup and outperforms work stealing by orders of magnitude using the Gecode solver
Learning Multiple Defaults for Machine Learning Algorithms
The performance of modern machine learning methods highly depends on their
hyperparameter configurations. One simple way of selecting a configuration is
to use default settings, often proposed along with the publication and
implementation of a new algorithm. Those default values are usually chosen in
an ad-hoc manner to work good enough on a wide variety of datasets. To address
this problem, different automatic hyperparameter configuration algorithms have
been proposed, which select an optimal configuration per dataset. This
principled approach usually improves performance, but adds additional
algorithmic complexity and computational costs to the training procedure. As an
alternative to this, we propose learning a set of complementary default values
from a large database of prior empirical results. Selecting an appropriate
configuration on a new dataset then requires only a simple, efficient and
embarrassingly parallel search over this set. We demonstrate the effectiveness
and efficiency of the approach we propose in comparison to random search and
Bayesian Optimization
RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine
Phylogenetic tree reconstruction is one of the grand challenge
problems in Bioinformatics. The search for a best-scoring tree with 50
organisms, under a reasonable optimality criterion, creates a
topological search space which is as large as the number of atoms in
the universe. Computational phylogeny is challenging even for the most
powerful supercomputers. It is also an ideal candidate for
benchmarking emerging multiprocessor architectures, because it
exhibits various levels of fine and coarse-grain parallelism. In this
paper, we present the porting, optimization, and evaluation of RAxML
on the Cell Broadband Engine. RAxML is a provably efficient, hill
climbing algorithm for computing phylogenetic trees based on the
Maximum Likelihood (ML) method. The algorithm uses an embarrassingly
parallel search method, which also exhibits data-level parallelism and
control parallelism in the computation of the likelihood functions.
We present the optimization of one of the currently fastest tree
search algorithms, on a real Cell blade prototype. We also
investigate problems and present solutions pertaining to the
optimization of floating point code, control flow, communication,
scheduling, and multi-level parallelization on the Cell
An adaptive CP method for TSP solving
M. Sellmann showed that CP-based Lagrangian relaxation gave good results but the interactions between the two techniques were quite dicult to understand. There are two main reasons for this: the best multipliers do not lead to the best ltering and each ltering disrupts the Lagrangian multiplier problem (LMP) to be solved. As the resolution of the TSP in CP is mainly based on a Lagrangian relaxation, we propose to study in detail these interactions for this particular problem. This article experimentally conrms the above statements and shows that it is very dicult to establish any relationship between ltering and the method used to solve the LMP in practice. Thus, it seems very dicult to select a priori the best method suited for a given instance. We propose to use a multi-armed bandit algorithm to nd the best possible method to use. The experimental results show the advantages of our approach
Implementing YewPar: a framework for parallel tree search
Combinatorial search is central to many applications yet
hard to parallelise. We argue for improving the reuse of parallel searches,
and present the design and implementation of a new parallel search
framework. YewPar generalises search by abstracting search tree generation, and by providing algorithmic skeletons that support three search
types, together with a set of search coordination strategies. The evaluation shows that the cost of YewPar generality is low (6.1%); global
knowledge is inexpensively shared between workers; irregular tasks are
effectively distributed; and YewPar delivers good runtimes, speedups and
efficiency with up to 255 workers on 17 localitie
Solving Graph Coloring Problems with Abstraction and Symmetry
This paper introduces a general methodology, based on abstraction and
symmetry, that applies to solve hard graph edge-coloring problems and
demonstrates its use to provide further evidence that the Ramsey number
. The number is often presented as the unknown Ramsey
number with the best chances of being found "soon". Yet, its precise value has
remained unknown for more than 50 years. We illustrate our approach by showing
that: (1) there are precisely 78{,}892 Ramsey colorings; and (2)
if there exists a Ramsey coloring then it is (13,8,8) regular.
Specifically each node has 13 edges in the first color, 8 in the second, and 8
in the third. We conjecture that these two results will help provide a proof
that no Ramsey coloring exists implying that
Load balancing for constraint solving with GPUs
Solving a complex Constraint Satisfaction Problem (CSP) is a computationally hard task which may require a considerable amount of time. Parallelism has been applied successfully to the job and there are already many applications capable of harnessing the parallel power
of modern CPUs to speed up the solving process.
Current Graphics Processing Units (GPUs), containing from a few hundred to a few thousand cores, possess a level of parallelism that surpasses that of CPUs and there are much less applications capable of solving CSPs on GPUs, leaving space for further improvement.
This paper describes work in progress in the solving of CSPs on GPUs, CPUs and other devices, such as Intel Many Integrated Cores (MICs), in parallel. It presents the gains obtained when applying more devices to solve some problems and the main challenges that must be faced when
using devices with as different architectures as CPUs and GPUs, with a greater focus on how to effectively achieve good load balancing between such heterogeneous devices
- …