53 research outputs found

    Embarrassingly Parallel Search

    Get PDF
    International audienceWe propose the Embarrassingly Parallel Search, a simple and efficient method for solving constraint programming problems in parallel. We split the initial problem into a huge number of independent subproblems and solve them with available workers (i.e., cores of machines). The decomposition into subproblems is computed by selecting a subset of variables and by enumerating the combinations of values of these variables that are not detected inconsistent by the propagation mechanism of a CP Solver. The experiments on satisfaction problems and on optimization problems suggest that generating between thirty and one hundred subproblems per worker leads to a good scalability. We show that our method is quite competitive with the work stealing approach and able to solve some classical problems at the maximum capacity of the multi-core machines. Thanks to it, a user can parallelize the resolution of its problem without modifying the solver or writing any parallel source code and can easily replay the resolution of a problem

    Improvement of the Embarrassingly Parallel Search for Data Centers

    Get PDF
    International audienceWe propose an adaptation of the Embarrassingly Parallel Search (EPS) method for data centers. EPS is a simple but efficient method for parallel solving of CSPs. EPS decomposes the problem in many distinct subproblems which are then solved independently by workers. EPS performed well on multi-cores machines (40), but some issues arise when using more cores in a datacenter. Here, we identify the decomposition as the cause of the degradation and propose a parallel decomposition to address this issue. Thanks to it, EPS gives almost linear speedup and outperforms work stealing by orders of magnitude using the Gecode solver

    Learning Multiple Defaults for Machine Learning Algorithms

    Get PDF
    The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance, but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization

    RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine

    Get PDF
    Phylogenetic tree reconstruction is one of the grand challenge problems in Bioinformatics. The search for a best-scoring tree with 50 organisms, under a reasonable optimality criterion, creates a topological search space which is as large as the number of atoms in the universe. Computational phylogeny is challenging even for the most powerful supercomputers. It is also an ideal candidate for benchmarking emerging multiprocessor architectures, because it exhibits various levels of fine and coarse-grain parallelism. In this paper, we present the porting, optimization, and evaluation of RAxML on the Cell Broadband Engine. RAxML is a provably efficient, hill climbing algorithm for computing phylogenetic trees based on the Maximum Likelihood (ML) method. The algorithm uses an embarrassingly parallel search method, which also exhibits data-level parallelism and control parallelism in the computation of the likelihood functions. We present the optimization of one of the currently fastest tree search algorithms, on a real Cell blade prototype. We also investigate problems and present solutions pertaining to the optimization of floating point code, control flow, communication, scheduling, and multi-level parallelization on the Cell

    An adaptive CP method for TSP solving

    Get PDF
    M. Sellmann showed that CP-based Lagrangian relaxation gave good results but the interactions between the two techniques were quite dicult to understand. There are two main reasons for this: the best multipliers do not lead to the best ltering and each ltering disrupts the Lagrangian multiplier problem (LMP) to be solved. As the resolution of the TSP in CP is mainly based on a Lagrangian relaxation, we propose to study in detail these interactions for this particular problem. This article experimentally conrms the above statements and shows that it is very dicult to establish any relationship between ltering and the method used to solve the LMP in practice. Thus, it seems very dicult to select a priori the best method suited for a given instance. We propose to use a multi-armed bandit algorithm to nd the best possible method to use. The experimental results show the advantages of our approach

    Implementing YewPar: a framework for parallel tree search

    Get PDF
    Combinatorial search is central to many applications yet hard to parallelise. We argue for improving the reuse of parallel searches, and present the design and implementation of a new parallel search framework. YewPar generalises search by abstracting search tree generation, and by providing algorithmic skeletons that support three search types, together with a set of search coordination strategies. The evaluation shows that the cost of YewPar generality is low (6.1%); global knowledge is inexpensively shared between workers; irregular tasks are effectively distributed; and YewPar delivers good runtimes, speedups and efficiency with up to 255 workers on 17 localitie

    Solving Graph Coloring Problems with Abstraction and Symmetry

    Get PDF
    This paper introduces a general methodology, based on abstraction and symmetry, that applies to solve hard graph edge-coloring problems and demonstrates its use to provide further evidence that the Ramsey number R(4,3,3)=30R(4,3,3)=30. The number R(4,3,3)R(4,3,3) is often presented as the unknown Ramsey number with the best chances of being found "soon". Yet, its precise value has remained unknown for more than 50 years. We illustrate our approach by showing that: (1) there are precisely 78{,}892 (3,3,3;13)(3,3,3;13) Ramsey colorings; and (2) if there exists a (4,3,3;30)(4,3,3;30) Ramsey coloring then it is (13,8,8) regular. Specifically each node has 13 edges in the first color, 8 in the second, and 8 in the third. We conjecture that these two results will help provide a proof that no (4,3,3;30)(4,3,3;30) Ramsey coloring exists implying that R(4,3,3)=30R(4,3,3)=30

    Load balancing for constraint solving with GPUs

    Get PDF
    Solving a complex Constraint Satisfaction Problem (CSP) is a computationally hard task which may require a considerable amount of time. Parallelism has been applied successfully to the job and there are already many applications capable of harnessing the parallel power of modern CPUs to speed up the solving process. Current Graphics Processing Units (GPUs), containing from a few hundred to a few thousand cores, possess a level of parallelism that surpasses that of CPUs and there are much less applications capable of solving CSPs on GPUs, leaving space for further improvement. This paper describes work in progress in the solving of CSPs on GPUs, CPUs and other devices, such as Intel Many Integrated Cores (MICs), in parallel. It presents the gains obtained when applying more devices to solve some problems and the main challenges that must be faced when using devices with as different architectures as CPUs and GPUs, with a greater focus on how to effectively achieve good load balancing between such heterogeneous devices
    corecore