8,495 research outputs found

    Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching

    Get PDF

    On Termination of Integer Linear Loops

    Full text link
    A fundamental problem in program verification concerns the termination of simple linear loops of the form x := u ; while Bx >= b do {x := Ax + a} where x is a vector of variables, u, a, and c are integer vectors, and A and B are integer matrices. Assuming the matrix A is diagonalisable, we give a decision procedure for the problem of whether, for all initial integer vectors u, such a loop terminates. The correctness of our algorithm relies on sophisticated tools from algebraic and analytic number theory, Diophantine geometry, and real algebraic geometry. To the best of our knowledge, this is the first substantial advance on a 10-year-old open problem of Tiwari (2004) and Braverman (2006).Comment: Accepted to SODA1

    Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

    Get PDF
    Achieving high performance and performance portability for large-scale scientific applications is a major challenge on heterogeneous computing systems such as many-core CPUs and accelerators like GPUs. In this work, we implement a widely used block eigensolver, Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG), using two popular directive based programming models (OpenMP and OpenACC) for GPU-accelerated systems. Our work differs from existing work in that it adopts a holistic approach that optimizes the full solver performance rather than narrowing the problem into small kernels (e.g., SpMM, SpMV). Our LOPBCG GPU implementation achieves a 2.8×{\times }–4.3×{\times } speedup over an optimized CPU implementation when tested with four different input matrices. The evaluated configuration compared one Skylake CPU to one Skylake CPU and one NVIDIA V100 GPU. Our OpenMP and OpenACC LOBPCG GPU implementations gave nearly identical performance. We also consider how to create an efficient LOBPCG solver that can solve problems larger than GPU memory capacity. To this end, we create microbenchmarks representing the two dominant kernels (inner product and SpMM kernel) in LOBPCG and then evaluate performance when using two different programming approaches: tiling the kernels, and using Unified Memory with the original kernels. Our tiled SpMM implementation achieves a 2.9×{\times } and 48.2×{\times } speedup over the Unified Memory implementation on supercomputers with PCIe Gen3 and NVLink 2.0 CPU to GPU interconnects, respectively

    Nearly-Linear Time LP Solvers and Rounding Algorithms for Scheduling Problems

    Get PDF
    We study nearly-linear time approximation algorithms for non-preemptive scheduling problems in two settings: the unrelated machine setting, and the identical machine with job precedence constraints setting, under the well-studied objectives such as makespan and weighted completion time. For many problems, we develop nearly-linear time approximation algorithms with approximation ratios matching the current best ones achieved in polynomial time. Our main technique is linear programming relaxation. For the unrelated machine setting, we formulate mixed packing and covering LP relaxations of nearly-linear size, and solve them approximately using the nearly-linear time solver of Young. For the makespan objective, we develop a rounding algorithm with (2+?)-approximation ratio. For the weighted completion time objective, we prove the LP is as strong as the rectangle LP used by Im and Li, leading to a nearly-linear time (1.45 + ?)-approximation for the problem. For problems in the identical machine with precedence constraints setting, the precedence constraints can not be formulated as packing or covering constraints. To achieve the nearly-linear running time, we define a polytope for the constraints, and leverage the multiplicative weight update (MWU) method with an oracle which always returns solutions in the polytope
    • …
    corecore