6,325 research outputs found

    Replicable parallel branch and bound search

    Get PDF
    Combinatorial branch and bound searches are a common technique for solving global optimisation and decision problems. Their performance often depends on good search order heuristics, refined over decades of algorithms research. Parallel search necessarily deviates from the sequential search order, sometimes dramatically and unpredictably, e.g. by distributing work at random. This can disrupt effective search order heuristics and lead to unexpected and highly variable parallel performance. The variability makes it hard to reason about the parallel performance of combinatorial searches. This paper presents a generic parallel branch and bound skeleton, implemented in Haskell, with replicable parallel performance. The skeleton aims to preserve the search order heuristic by distributing work in an ordered fashion, closely following the sequential search order. We demonstrate the generality of the approach by applying the skeleton to 40 instances of three combinatorial problems: Maximum Clique, 0/1 Knapsack and Travelling Salesperson. The overheads of our Haskell skeleton are reasonable: giving slowdown factors of between 1.9 and 6.2 compared with a class-leading, dedicated, and highly optimised C++ Maximum Clique solver. We demonstrate scaling up to 200 cores of a Beowulf cluster, achieving speedups of 100x for several Maximum Clique instances. We demonstrate low variance of parallel performance across all instances of the three combinatorial problems and at all scales up to 200 cores, with median Relative Standard Deviation (RSD) below 2%. Parallel solvers that do not follow the sequential search order exhibit far higher variance, with median RSD exceeding 85% for Knapsack

    Algorithmic skeletons for exact combinatorial search at scale

    Get PDF
    Exact combinatorial search is essential to a wide range of application areas including constraint optimisation, graph matching, and computer algebra. Solutions to combinatorial problems are found by systematically exploring a search space, either to enumerate solutions, determine if a specific solution exists, or to find an optimal solution. Combinatorial searches are computationally hard both in theory and practice, and efficiently exploring the huge number of combinations is a real challenge, often addressed using approximate search algorithms. Alternatively, exact search can be parallelised to reduce execution time. However, parallel search is challenging due to both highly irregular search trees and sensitivity to search order, leading to anomalies that can cause unexpected speedups and slowdowns. As core counts continue to grow, parallel search becomes increasingly useful for improving the performance of existing searches, and allowing larger instances to be solved. A high-level approach to parallel search allows non-expert users to benefit from increasing core counts. Algorithmic Skeletons provide reusable implementations of common parallelism patterns that are parameterised with user code which determines the specific computation, e.g. a particular search. We define a set of skeletons for exact search, requiring the user to provide in the minimal case a single class that specifies how the search tree is generated and a parameter that specifies the type of search required. The five are: Sequential search; three general-purpose parallel search methods: Depth-Bounded, Stack-Stealing, and Budget; and a specific parallel search method, Ordered, that guarantees replicable performance. We implement and evaluate the skeletons in a new C++ parallel search framework, YewPar. YewPar provides both high-level skeletons and low-level search specific schedulers and utilities to deal with the irregularity of search and knowledge exchange between workers. YewPar is based on the HPX library for distributed task-parallelism potentially allowing search to execute on multi-cores, clusters, cloud, and high performance computing systems. Underpinning the skeleton design is a novel formal model, MT^3 , a parallel operational semantics that describes multi-threaded tree traversals, allowing reasoning about parallel search, e.g. describing common parallel search phenomena such as performance anomalies. YewPar is evaluated using seven different search applications (and over 25 specific instances): Maximum Clique, k-Clique, Subgraph Isomorphism, Travelling Salesperson, Binary Knapsack, Enumerating Numerical Semigroups, and the Unbalanced Tree Search Benchmark. The search instances are evaluated at multiple scales from 1 to 255 workers, on a 17 host, 272 core Beowulf cluster. The overheads of the skeletons are low, with a mean 6.1% slowdown compared to hand-coded sequential implementation. Crucially, for all search applications YewPar reduces search times by an order of magnitude, i.e hours/minutes to minutes/seconds, and we commonly see greater than 60% (average) parallel efficiency speedups for up to 255 workers. Comparing skeleton performance reveals that no one skeleton is best for all searches, highlighting a benefit of a skeleton approach that allows multiple parallelisations to be explored with minimal refactoring. The Ordered skeleton avoids slowdown anomalies where, due to search knowledge being order dependent, a parallel search takes longer than a sequential search. Analysis of Ordered shows that, while being 41% slower on average (73% worse-case) than Depth-Bounded, in nearly all cases it maintains the following replicable performance properties: 1) parallel executions are no slower than one worker sequential executions 2) runtimes do not increase as workers are added, and 3) variance between repeated runs is low. In particular, where Ordered maintains a relative standard deviation (RSD) of less than 15%, Depth-Bounded suffers from an RSD greater than 50%, showing the importance of carefully controlling search orders for repeatability

    Towards Generic Scalable Parallel Combinatorial Search

    Get PDF
    Combinatorial search problems in mathematics, e.g. in finite geometry, are notoriously hard; a state-of-the-art backtracking search algorithm can easily take months to solve a single problem. There is clearly demand for parallel combinatorial search algorithms scaling to hundreds of cores and beyond. However, backtracking combinatorial searches are challenging to parallelise due to their sensitivity to search order and due to the their irregularly shaped search trees. Moreover, scaling parallel search to hundreds of cores generally requires highly specialist parallel programming expertise. This paper proposes a generic scalable framework for solving hard combinatorial problems. Key elements are distributed memory task parallelism (to achieve scale), work stealing (to cope with irregularity), and generic algorithmic skeletons for combinatorial search (to reduce the parallelism expertise required). We outline two implementations: a mature Haskell Tree Search Library (HTSL) based around algorithmic skeletons and a prototype C++ Tree Search Library (CTSL) that uses hand coded applications. Experiments on maximum clique problems and on a problem in finite geometry, the search for spreads in H(4,2^2), show that (1) CTSL consistently outperforms HTSL on sequential runs, and (2) both libraries scale to 200 cores, e.g. speeding up spreads search by a factor of 81 (HTSL) and 60 (CTSL), respectively. This demonstrates the potential of our generic framework for scaling parallel combinatorial search to large distributed memory platforms

    Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

    Full text link
    The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (RL) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, like continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experimental results of PSO-P on IB are compared to results of closed-form control policies derived from the model-based Recurrent Control Neural Network (RCNN) and the model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded the best performing policy in our IB setting. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requiring only a relatively low amount of effort in finding adequate parameters or making complex design decisions

    Energy and Route Optimization of Moving Devices

    Get PDF
    This thesis highlights our efforts in energy and route optimization of moving devices. We have focused on three categories of such devices; industrial robots in a multi-robot environment, generic vehicles in a vehicle routing problem (VRP) context, automatedguided vehicles (AGVs) in a large-scale flexible manufacturing system (FMS). In the first category, the aim is to develop a non-intrusive energy optimization technique, based on a given set of paths and sequences of operations, such that the original cycle time is not exceeded. We develop an optimization procedure based on a mathematical programming model that aims to minimize the energy consumption and peak power. Our technique has several advantages. It is non-intrusive, i.e. it requires limited changes in the robot program and can be implemented easily. Moreover,it is model-free, in the sense that no particular, and perhaps secret, parameter or dynamic model is required. Furthermore, the optimization can be done offline, within seconds using a generic solver. Through careful experiments, we have shown that it is possible to reduce energy and peak-power up to about 30% and 50% respectively. The second category of moving devices comprises of generic vehicles in a VRP context. We have developed a hybrid optimization approach that integrates a distributed algorithm based on a gossip protocol with a column generation (CG) algorithm, which manages to solve the tested problems faster than the CG algorithm alone. The algorithm is developed for a VRP variation including time windows (VRPTW), which is meant to model the task of scheduling and routing of caregivers in the context of home healthcare routing and scheduling problems (HHRSPs). Moreover,the developed algorithm can easily be parallelized to further increase its efficiency. The last category deals with AGVs. The choice of AGVs was not arbitrary; by design, we decided to transfer our knowledge of energy optimization and routing algorithms to a class of moving devices in which both techniques are of interest. Initially, we improve an existing method of conflict-free AGV scheduling and routing, such that the new algorithm can manage larger problems. A heuristic version of the algorithm manages to solve the problem instances in a reasonable amount of time. Later, we develop strategies to reduce the energy consumption. The study is carried out using an AGV system installed at Volvo Cars. The results are promising; (1)the algorithm reduces performance measures such as makespan up to 50%, while reducing the total travelled distance of the vehicles about 14%, leading to an energy saving of roughly 14%, compared to the results obtained from the original traffic controller. (2) It is possible to reduce the cruise velocities such that more energy is saved, up to 20%, while the new makespan remains better than the original one

    Supervised regionalization methods, a survey.

    Get PDF
    This paper reviews almost four decades of contributions on the subject of supervised regionalization methods. These methods aggregate a set of areas into a predefined number of spatially contiguous regions while optimizing certain aggregation criteria. The authors present a taxonomic scheme that classifies a wide range of regionalization methods into eight groups, based on the strategy applied for satisfying the spatial contiguity constraint. The paper concludes by providing a qualitative comparison of these groups in terms of a set of certain characteristics, and by suggesting future lines of research for extending and improving these methods.regionalization, constrained clustering, analytical regions.

    Statistical user model supported by R-Tree structure

    Get PDF
    This paper is about developing a group user model able to predict unknown features (attributes, preferences, or behaviors) of any interlocutor. Specifically, for systems where there are features that cannot be modeled by a domain expert within the human computer interaction. In such cases, statistical models are applied instead of stereotype user models. The time consumption of these models is high, and when a requisite of bounded response time is added most common solution involves summarizing knowledge. Summarization involves deleting knowledge from the knowledge base and probably losing accuracy in the medium-term. This proposal provides all the advantages of statistical user models and avoids knowledge loss by using an R-Tree structure and various search spaces (universes of users) of diverse granularity for solving inferences with enhanced success rates. Along with the formalization and evaluation of the approach, main advantages will be discussed, and a perspective for its future evolution is provided. In addition, this paper provides a framework to evaluate statistical user models and to enable performance comparison among different statistical user models.This proposal development belongs to the research projects Thuban (TIN2008-02711), MA2VICMR (S2009/TIC- 1542) and Cadooh (TSI-020302-2011-21), supported respectively by the Spanish Ministry of Education and the Spanish Ministry of Industry, Tourism and Commerce.Publicad

    Discrete Particle Swarm Optimization for Flexible Flow Line Scheduling

    Get PDF
    Previous research on scheduling flexible flow lines (FFL) to minimize makespan has utilized approaches such as branch and bound, integer programming, or heuristics. Metaheuristic methods have attracted increasing interest for solving scheduling problems in the past few years. Particle swarm optimization (PSO) is a population-based metaheuristic method which finds a solution based on the analogy of sharing useful information among individuals. In the previous literature different PSO algorithms have been introduced for various applications. In this research we study some of the PSO algorithms, continuous and discrete, to identify a strong PSO algorithm in scheduling flexible flow line to minimize the makespan. Then the effectiveness of this PSO algorithm in FFL scheduling is compared to genetic algorithms. Experimental results suggest that discrete particle swarm performs better in scheduling of flexible flow line with makespan criteria compared to continuous particle swarm. Moreover, combining discrete particle swarm with a local search improves the performance of the algorithm significantly and makes it competitive with the genetic algorithm (GA)
    • …
    corecore