17,183 research outputs found
Seeing Shapes in Clouds: On the Performance-Cost trade-off for Heterogeneous Infrastructure-as-a-Service
In the near future FPGAs will be available by the hour, however this new
Infrastructure as a Service (IaaS) usage mode presents both an opportunity and
a challenge: The opportunity is that programmers can potentially trade
resources for performance on a much larger scale, for much shorter periods of
time than before. The challenge is in finding and traversing the trade-off for
heterogeneous IaaS that guarantees increased resources result in the greatest
possible increased performance. Such a trade-off is Pareto optimal. The Pareto
optimal trade-off for clusters of heterogeneous resources can be found by
solving multiple, multi-objective optimisation problems, resulting in an
optimal allocation of tasks to the available platforms. Solving these
optimisation programs can be done using simple heuristic approaches or formal
Mixed Integer Linear Programming (MILP) techniques. When pricing 128 financial
options using a Monte Carlo algorithm upon a heterogeneous cluster of Multicore
CPU, GPU and FPGA platforms, the MILP approach produces a trade-off that is up
to 110% faster than a heuristic approach, and over 50% cheaper. These results
suggest that high quality performance-resource trade-offs of heterogeneous IaaS
are best realised through a formal optimisation approach.Comment: Presented at Second International Workshop on FPGAs for Software
Programmers (FSP 2015) (arXiv:1508.06320
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Scalable Kernelization for Maximum Independent Sets
The most efficient algorithms for finding maximum independent sets in both
theory and practice use reduction rules to obtain a much smaller problem
instance called a kernel. The kernel can then be solved quickly using exact or
heuristic algorithms---or by repeatedly kernelizing recursively in the
branch-and-reduce paradigm. It is of critical importance for these algorithms
that kernelization is fast and returns a small kernel. Current algorithms are
either slow but produce a small kernel, or fast and give a large kernel. We
attempt to accomplish both of these goals simultaneously, by giving an
efficient parallel kernelization algorithm based on graph partitioning and
parallel bipartite maximum matching. We combine our parallelization techniques
with two techniques to accelerate kernelization further: dependency checking
that prunes reductions that cannot be applied, and reduction tracking that
allows us to stop kernelization when reductions become less fruitful. Our
algorithm produces kernels that are orders of magnitude smaller than the
fastest kernelization methods, while having a similar execution time.
Furthermore, our algorithm is able to compute kernels with size comparable to
the smallest known kernels, but up to two orders of magnitude faster than
previously possible. Finally, we show that our kernelization algorithm can be
used to accelerate existing state-of-the-art heuristic algorithms, allowing us
to find larger independent sets faster on large real-world networks and
synthetic instances.Comment: Extended versio
A Domain Specific Approach to High Performance Heterogeneous Computing
Users of heterogeneous computing systems face two problems: firstly, in
understanding the trade-off relationships between the observable
characteristics of their applications, such as latency and quality of the
result, and secondly, how to exploit knowledge of these characteristics to
allocate work to distributed computing platforms efficiently. A domain specific
approach addresses both of these problems. By considering a subset of
operations or functions, models of the observable characteristics or domain
metrics may be formulated in advance, and populated at run-time for task
instances. These metric models can then be used to express the allocation of
work as a constrained integer program, which can be solved using heuristics,
machine learning or Mixed Integer Linear Programming (MILP) frameworks. These
claims are illustrated using the example domain of derivatives pricing in
computational finance, with the domain metrics of workload latency or makespan
and pricing accuracy. For a large, varied workload of 128 Black-Scholes and
Heston model-based option pricing tasks, running upon a diverse array of 16
Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both
the makespan and accuracy are generally within 10% of the run-time performance.
When these models are used as inputs to machine learning and MILP-based
workload allocation approaches, a latency improvement of up to 24 and 270 times
over the heuristic approach is seen.Comment: 14 pages, preprint draft, minor revisio
Adapting the interior point method for the solution of linear programs on high performance computers
In this paper we describe a unified algorithmic framework for the interior point method (IPM) of solving Linear Programs (LPs) which allows us to adapt it over a range of high performance computer architectures. We set out the reasons as to why IPM makes better use of high performance computer architecture than the sparse simplex method. In the inner iteration of the IPM a search direction is computed using Newton or higher order methods. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system and the design of data structures to take advantage of coarse grain parallel and massively parallel computer architectures are considered in detail. Finally, we present experimental results of solving NETLIB test problems on examples of these architectures and put forward arguments as to why integration of the system within sparse simplex is beneficial
- …