2 research outputs found
Effective Implementation of GPU-based Revised Simplex algorithm applying new memory management and cycle avoidance strategies
Graphics Processing Units (GPUs) with high computational capabilities used as
modern parallel platforms to deal with complex computational problems. We use
this platform to solve large-scale linear programing problems by revised
simplex algorithm. To implement this algorithm, we propose some new memory
management strategies. In addition, to avoid cycling because of degeneracy
conditions, we use a tabu rule for entering variable selection in the revised
simplex algorithm. To evaluate this algorithm, we consider two sets of
benchmark problems and compare the speedup factors for these problems. The
comparisons demonstrate that the proposed method is highly effective and solve
the problems with the maximum speedup factors 165.2 and 65.46 with respect to
the sequential version and Matlab Linprog solver respectively.Comment: 27 pages, 6 Tables, 10 Figures, Extracted from a PhD research program
in Department of Computer Science of Amirkabir University of Technology,
Tehran, Ira
Simultaneous Solving of Batched Linear Programs on a GPU
Linear Programs (LPs) appear in a large number of applications and offloading
them to a GPU is viable to gain performance. Existing work on offloading and
solving an LP on a GPU suggests that there is performance gain generally on
large sized LPs (typically 500 constraints, 500 variables and above). In order
to gain performance from a GPU, for applications involving small to medium
sized LPs, we propose batched solving of a large number of LPs in parallel. In
this paper, we present the design and implementation of a batched LP solver in
CUDA, keeping memory coalescent access, low CPU-GPU memory transfer latency and
load balancing as the goals. The performance of the batched LP solver is
compared against sequential solving in the CPU using the open source solver
GLPK (GNU Linear Programming Kit) and the CPLEX solver from IBM. The evaluation
on selected LP benchmarks from the Netlib repository displays a maximum
speed-up of 95x and 5x with respect to CPLEX and GLPK solver respectively, for
a batch of 1e5 LPs. We demonstrate the application of our batched LP solver to
enhance performance in the domain of state-space exploration of mathematical
models of control systems design.Comment: Around 13 figures and 24 pages. arXiv admin note: substantial text
overlap with arXiv:1609.0811