    A Multicore Tool for Constraint Solving

    *** To appear in IJCAI 2015 proceedings *** In Constraint Programming (CP), a portfolio solver uses a variety of different solvers for solving a given Constraint Satisfaction / Optimization Problem. In this paper we introduce sunny-cp2: the first parallel CP portfolio solver that enables a dynamic, cooperative, and simultaneous execution of its solvers in a multicore setting. It incorporates state-of-the-art solvers, providing also a usable and configurable framework. Empirical results are very promising. sunny-cp2 can even outperform the performance of the oracle solver which always selects the best solver of the portfolio for a given problem

    Variable Annealing Length and Parallelism in Simulated Annealing

    In this paper, we propose: (a) a restart schedule for an adaptive simulated annealer, and (b) parallel simulated annealing, with an adaptive and parameter-free annealing schedule. The foundation of our approach is the Modified Lam annealing schedule, which adaptively controls the temperature parameter to track a theoretically ideal rate of acceptance of neighboring states. A sequential implementation of Modified Lam simulated annealing is almost parameter-free. However, it requires prior knowledge of the annealing length. We eliminate this parameter using restarts, with an exponentially increasing schedule of annealing lengths. We then extend this restart schedule to parallel implementation, executing several Modified Lam simulated annealers in parallel, with varying initial annealing lengths, and our proposed parallel annealing length schedule. To validate our approach, we conduct experiments on an NP-Hard scheduling problem with sequence-dependent setup constraints. We compare our approach to fixed length restarts, both sequentially and in parallel. Our results show that our approach can achieve substantial performance gains, throughout the course of the run, demonstrating our approach to be an effective anytime algorithm.Comment: Tenth International Symposium on Combinatorial Search, pages 2-10. June 201

    Universal performance bounds of restart

    As has long been known to computer scientists, the performance of probabilistic algorithms characterized by relatively large runtime fluctuations can be improved by applying a restart, i.e., episodic interruption of a randomized computational procedure followed by initialization of its new statistically independent realization. A similar effect of restart-induced process acceleration could potentially be possible in the context of enzymatic reactions, where dissociation of the enzyme-substrate intermediate corresponds to restarting the catalytic step of the reaction. To date, a significant number of analytical results have been obtained in physics and computer science regarding the effect of restart on the completion time statistics in various model problems, however, the fundamental limits of restart efficiency remain unknown. Here we derive a range of universal statistical inequalities that offer constraints on the effect that restart could impose on the completion time of a generic stochastic process. The corresponding bounds are expressed via simple statistical metrics of the original process such as harmonic mean hh, median value mm and mode MM, and, thus, are remarkably practical. We test our analytical predictions with multiple numerical examples, discuss implications arising from them and important avenues of future work.Comment: 12 pages, 2 figure

    Sequential and parallel solution-biased search for subgraph algorithms

    Funding: This work was supported by the Engineering and Physical Sciences Research Council (grant numbers EP/P026842/1, EP/M508056/1, and EP/N007565).The current state of the art in subgraph isomorphism solving involves using degree as a value-ordering heuristic to direct backtracking search. Such a search makes a heavy commitment to the first branching choice, which is often incorrect. To mitigate this, we introduce and evaluate a new approach, which we call “solution-biased search”. By combining a slightly-random value-ordering heuristic, rapid restarts, and nogood recording, we design an algorithm which instead uses degree to direct the proportion of search effort spent in different subproblems. This increases performance by two orders of magnitude on satisfiable instances, whilst not affecting performance on unsatisfiable instances. This algorithm can also be parallelised in a very simple but effective way: across both satisfiable and unsatisfiable instances, we get a further speedup of over thirty from thirty-six cores, and over one hundred from ten distributed-memory hosts. Finally, we show that solution-biased search is also suitable for optimisation problems, by using it to improve two maximum common induced subgraph algorithms.Postprin

    A review of literature on parallel constraint solving

    As multicore computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint algorithms are amenable to parallelisation; whether to use shared memory or distributed computation; whether to use static or dynamic decomposition; and how to best exploit portfolios and cooperating search. We review the literature, and see that we can sometimes do quite well, some of the time, on some instances, but we are far from a general solution. Yet there seems to be little overall guidance that can be given on how best to exploit multicore computers to speed up constraint solving. We hope at least that this survey will provide useful pointers to future researchers wishing to correct this situation

    Minimisation des perturbations et parallélisation pour la planification et l'ordonnancement

    Nous étudions dans cette thèse deux approches réduisant le temps de traitement nécessaire pour résoudre des problèmes de planification et d'ordonnancement dans un contexte de programmation par contraintes. Nous avons expérimenté avec plusieurs milliers de processeurs afin de résoudre le problème de planification et d'ordonnancement des opérations de rabotage du bois d'oeuvre. Ces problèmes sont d'une grande importance pour les entreprises, car ils permettent de mieux gérer leur production et d'économiser des coûts reliés à leurs opérations. La première approche consiste à effectuer une parallélisation de l'algorithme de résolution du problème. Nous proposons une nouvelle technique de parallélisation (nommée PDS) des stratégies de recherche atteignant quatre buts : le respect de l'ordre de visite des noeuds de l'arbre de recherche tel que défini par l'algorithme séquentiel, l'équilibre de la charge de travail entre les processeurs, la robustesse aux défaillances matérielles et l'absence de communications entre les processeurs durant le traitement. Nous appliquons cette technique pour paralléliser la stratégie de recherche Limited Discrepancy-based Search (LDS) pour ainsi obtenir Parallel Limited Discrepancy-Based Search (PLDS). Par la suite, nous démontrons qu'il est possible de généraliser cette technique en l'appliquant à deux autres stratégies de recherche : Depth-Bounded discrepancy Search (DDS) et Depth-First Search (DFS). Nous obtenons, respectivement, les stratégies Parallel Discrepancy-based Search (PDDS) et Parallel Depth-First Search (PDFS). Les algorithmes parallèles ainsi obtenus créent un partage intrinsèque de la charge de travail : la différence de charge de travail entre les processeurs est bornée lorsqu'une branche de l'arbre de recherche est coupée. En utilisant des jeux de données de partenaires industriels, nous avons pu améliorer les meilleures solutions connues. Avec la deuxième approche, nous avons élaboré une méthode pour minimiser les changements effectués à un plan de production existant lorsque de nouvelles informations, telles que des commandes additionnelles, sont prises en compte. Replanifier entièrement les activités de production peut mener à l'obtention d'un plan de production très différent qui mène à des coûts additionnels et des pertes de temps pour les entreprises. Nous étudions les perturbations causéees par la replanification à l'aide de trois métriques de distances entre deux plans de production : la distance de Hamming, la distance d'édition et la distance de Damerau-Levenshtein. Nous proposons trois modèles mathématiques permettant de minimiser ces perturbations en incluant chacune de ces métriques comme fonction objectif au moment de la replanification. Nous appliquons cette approche au problème de planification et ordonnancement des opérations de finition du bois d'oeuvre et nous démontrons que cette approche est plus rapide qu'une replanification à l'aide du modèle d'origine.We study in this thesis two approaches that reduce the processing time needed to solve planning and ordering problems in a constraint programming context. We experiment with multiple thousands of processors on the planning and scheduling problem of wood-finish operations. These issues are of a great importance for businesses, because they can better manage their production and save costs related to their operations. The first approach consists in a parallelization of the problem solving algorithm. We propose a new parallelization technique (named PDS) of the search strategies, that reaches four goals: conservation of the nodes visit order in the search tree as defined by the sequential algorithm, balancing of the workload between the processors, robustness against hardware failures, and absence of communication between processors during the treatment. We apply this technique to parallelize the Limited Discrepancy-based (LDS) search strategy to obtain Parallel Limited Discrepancy-Based Search (PLDS). We then show that this technique can be generalized by parallelizing two other search strategies: Depth-Bounded discrepancy Search (DDS) and Depth-First Search (DFS). We obtain, respectively, Parallel Discrepancy-based Search (PDDS) and Parallel Depth-First Search (PDFS). The algorithms obtained this way create an intrinsic workload balance: the imbalance of the workload among the processors is bounded when a branch of the search tree is pruned. By using datasets coming from industrial partners, we are able to improve the best known solutions. With the second approach, we elaborated a method to minimize the changes done to an existing production plan when new information, such as additional orders, are taken into account. Completely re-planning the production activities can lead to a very different production plan which create additional costs and loss of time for businesses. We study the perturbations caused by the re-planification with three distance metrics: Hamming distance, Edit distance, and Damerau-Levenshtein Distance. We propose three mathematical models that allow to minimize these perturbations by including these metrics in the objective function when replanning. We apply this approach to the planning and scheduling problem of wood-finish operations and we demonstrate that this approach outperforms the use of the original model

    Algorithmic skeletons for exact combinatorial search at scale

    Exact combinatorial search is essential to a wide range of application areas including constraint optimisation, graph matching, and computer algebra. Solutions to combinatorial problems are found by systematically exploring a search space, either to enumerate solutions, determine if a specific solution exists, or to find an optimal solution. Combinatorial searches are computationally hard both in theory and practice, and efficiently exploring the huge number of combinations is a real challenge, often addressed using approximate search algorithms. Alternatively, exact search can be parallelised to reduce execution time. However, parallel search is challenging due to both highly irregular search trees and sensitivity to search order, leading to anomalies that can cause unexpected speedups and slowdowns. As core counts continue to grow, parallel search becomes increasingly useful for improving the performance of existing searches, and allowing larger instances to be solved. A high-level approach to parallel search allows non-expert users to benefit from increasing core counts. Algorithmic Skeletons provide reusable implementations of common parallelism patterns that are parameterised with user code which determines the specific computation, e.g. a particular search. We define a set of skeletons for exact search, requiring the user to provide in the minimal case a single class that specifies how the search tree is generated and a parameter that specifies the type of search required. The five are: Sequential search; three general-purpose parallel search methods: Depth-Bounded, Stack-Stealing, and Budget; and a specific parallel search method, Ordered, that guarantees replicable performance. We implement and evaluate the skeletons in a new C++ parallel search framework, YewPar. YewPar provides both high-level skeletons and low-level search specific schedulers and utilities to deal with the irregularity of search and knowledge exchange between workers. YewPar is based on the HPX library for distributed task-parallelism potentially allowing search to execute on multi-cores, clusters, cloud, and high performance computing systems. Underpinning the skeleton design is a novel formal model, MT^3 , a parallel operational semantics that describes multi-threaded tree traversals, allowing reasoning about parallel search, e.g. describing common parallel search phenomena such as performance anomalies. YewPar is evaluated using seven different search applications (and over 25 specific instances): Maximum Clique, k-Clique, Subgraph Isomorphism, Travelling Salesperson, Binary Knapsack, Enumerating Numerical Semigroups, and the Unbalanced Tree Search Benchmark. The search instances are evaluated at multiple scales from 1 to 255 workers, on a 17 host, 272 core Beowulf cluster. The overheads of the skeletons are low, with a mean 6.1% slowdown compared to hand-coded sequential implementation. Crucially, for all search applications YewPar reduces search times by an order of magnitude, i.e hours/minutes to minutes/seconds, and we commonly see greater than 60% (average) parallel efficiency speedups for up to 255 workers. Comparing skeleton performance reveals that no one skeleton is best for all searches, highlighting a benefit of a skeleton approach that allows multiple parallelisations to be explored with minimal refactoring. The Ordered skeleton avoids slowdown anomalies where, due to search knowledge being order dependent, a parallel search takes longer than a sequential search. Analysis of Ordered shows that, while being 41% slower on average (73% worse-case) than Depth-Bounded, in nearly all cases it maintains the following replicable performance properties: 1) parallel executions are no slower than one worker sequential executions 2) runtimes do not increase as workers are added, and 3) variance between repeated runs is low. In particular, where Ordered maintains a relative standard deviation (RSD) of less than 15%, Depth-Bounded suffers from an RSD greater than 50%, showing the importance of carefully controlling search orders for repeatability

    Parallel Restarted Search

    We consider the problem of parallelizing restarted backtrack search. With few notable exceptions, most commercial and academic constraint programming solvers do not learn no-goods during search. Depending on the branching heuristics used, this means that there are little to no side-effects between restarts, making them an excellent target for parallelization. We develop a simple technique for parallelizing restarted search deterministically and demonstrate experimentally that we can achieve near-linear speed-ups in practice