1 research outputs found
Simultaneous Solving of Batched Linear Programs on a GPU
Linear Programs (LPs) appear in a large number of applications and offloading
them to a GPU is viable to gain performance. Existing work on offloading and
solving an LP on a GPU suggests that there is performance gain generally on
large sized LPs (typically 500 constraints, 500 variables and above). In order
to gain performance from a GPU, for applications involving small to medium
sized LPs, we propose batched solving of a large number of LPs in parallel. In
this paper, we present the design and implementation of a batched LP solver in
CUDA, keeping memory coalescent access, low CPU-GPU memory transfer latency and
load balancing as the goals. The performance of the batched LP solver is
compared against sequential solving in the CPU using the open source solver
GLPK (GNU Linear Programming Kit) and the CPLEX solver from IBM. The evaluation
on selected LP benchmarks from the Netlib repository displays a maximum
speed-up of 95x and 5x with respect to CPLEX and GLPK solver respectively, for
a batch of 1e5 LPs. We demonstrate the application of our batched LP solver to
enhance performance in the domain of state-space exploration of mathematical
models of control systems design.Comment: Around 13 figures and 24 pages. arXiv admin note: substantial text
overlap with arXiv:1609.0811