17,183 research outputs found

    Seeing Shapes in Clouds: On the Performance-Cost trade-off for Heterogeneous Infrastructure-as-a-Service

    Full text link
    In the near future FPGAs will be available by the hour, however this new Infrastructure as a Service (IaaS) usage mode presents both an opportunity and a challenge: The opportunity is that programmers can potentially trade resources for performance on a much larger scale, for much shorter periods of time than before. The challenge is in finding and traversing the trade-off for heterogeneous IaaS that guarantees increased resources result in the greatest possible increased performance. Such a trade-off is Pareto optimal. The Pareto optimal trade-off for clusters of heterogeneous resources can be found by solving multiple, multi-objective optimisation problems, resulting in an optimal allocation of tasks to the available platforms. Solving these optimisation programs can be done using simple heuristic approaches or formal Mixed Integer Linear Programming (MILP) techniques. When pricing 128 financial options using a Monte Carlo algorithm upon a heterogeneous cluster of Multicore CPU, GPU and FPGA platforms, the MILP approach produces a trade-off that is up to 110% faster than a heuristic approach, and over 50% cheaper. These results suggest that high quality performance-resource trade-offs of heterogeneous IaaS are best realised through a formal optimisation approach.Comment: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320

    Scalable Kernelization for Maximum Independent Sets

    Get PDF
    The most efficient algorithms for finding maximum independent sets in both theory and practice use reduction rules to obtain a much smaller problem instance called a kernel. The kernel can then be solved quickly using exact or heuristic algorithms---or by repeatedly kernelizing recursively in the branch-and-reduce paradigm. It is of critical importance for these algorithms that kernelization is fast and returns a small kernel. Current algorithms are either slow but produce a small kernel, or fast and give a large kernel. We attempt to accomplish both of these goals simultaneously, by giving an efficient parallel kernelization algorithm based on graph partitioning and parallel bipartite maximum matching. We combine our parallelization techniques with two techniques to accelerate kernelization further: dependency checking that prunes reductions that cannot be applied, and reduction tracking that allows us to stop kernelization when reductions become less fruitful. Our algorithm produces kernels that are orders of magnitude smaller than the fastest kernelization methods, while having a similar execution time. Furthermore, our algorithm is able to compute kernels with size comparable to the smallest known kernels, but up to two orders of magnitude faster than previously possible. Finally, we show that our kernelization algorithm can be used to accelerate existing state-of-the-art heuristic algorithms, allowing us to find larger independent sets faster on large real-world networks and synthetic instances.Comment: Extended versio

    A Domain Specific Approach to High Performance Heterogeneous Computing

    Full text link
    Users of heterogeneous computing systems face two problems: firstly, in understanding the trade-off relationships between the observable characteristics of their applications, such as latency and quality of the result, and secondly, how to exploit knowledge of these characteristics to allocate work to distributed computing platforms efficiently. A domain specific approach addresses both of these problems. By considering a subset of operations or functions, models of the observable characteristics or domain metrics may be formulated in advance, and populated at run-time for task instances. These metric models can then be used to express the allocation of work as a constrained integer program, which can be solved using heuristics, machine learning or Mixed Integer Linear Programming (MILP) frameworks. These claims are illustrated using the example domain of derivatives pricing in computational finance, with the domain metrics of workload latency or makespan and pricing accuracy. For a large, varied workload of 128 Black-Scholes and Heston model-based option pricing tasks, running upon a diverse array of 16 Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both the makespan and accuracy are generally within 10% of the run-time performance. When these models are used as inputs to machine learning and MILP-based workload allocation approaches, a latency improvement of up to 24 and 270 times over the heuristic approach is seen.Comment: 14 pages, preprint draft, minor revisio

    Adapting the interior point method for the solution of linear programs on high performance computers

    Get PDF
    In this paper we describe a unified algorithmic framework for the interior point method (IPM) of solving Linear Programs (LPs) which allows us to adapt it over a range of high performance computer architectures. We set out the reasons as to why IPM makes better use of high performance computer architecture than the sparse simplex method. In the inner iteration of the IPM a search direction is computed using Newton or higher order methods. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system and the design of data structures to take advantage of coarse grain parallel and massively parallel computer architectures are considered in detail. Finally, we present experimental results of solving NETLIB test problems on examples of these architectures and put forward arguments as to why integration of the system within sparse simplex is beneficial
    corecore