Search CORE

57,661 research outputs found

Adapting the interior point method for the solution of linear programs on high performance computers

Author: Ashcroft
Bixby
Chen
Duff
Forrest
Gay
George
Golub
Karmarkar
Lai
Liu
Megiddo
Monteiro
Publication venue: Brunel University
Publication date: 01/01/1991
Field of study

In this paper we describe a unified algorithmic framework for the interior point method (IPM) of solving Linear Programs (LPs) which allows us to adapt it over a range of high performance computer architectures. We set out the reasons as to why IPM makes better use of high performance computer architecture than the sparse simplex method. In the inner iteration of the IPM a search direction is computed using Newton or higher order methods. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system and the design of data structures to take advantage of coarse grain parallel and massively parallel computer architectures are considered in detail. Finally, we present experimental results of solving NETLIB test problems on examples of these architectures and put forward arguments as to why integration of the system within sparse simplex is beneficial

CiteSeerX

Crossref

Brunel University Research Archive

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server

PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization

Author: Liu Qiang
Liu Shaoshan
Qin Shuzhen
Yu Bo
Publication venue
Publication date: 07/05/2019
Field of study

Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, space exploration, street view map generation etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large optimization problems. Previous approaches of optimizing BA performance heavily rely on parallel processing or distributed computing, which trade higher power consumption for higher performance. In this paper we propose {\pi}-BA, the first hardware-software co-designed BA engine on an embedded FPGA-SoC that exploits custom hardware for higher performance and power efficiency. Specifically, based on our key observation that not all points appear on all images in a BA problem, we designed and implemented a Co-Observation Optimization technique to accelerate BA operations with optimized usage of memory and computation resources. Experimental results confirm that {\pi}-BA outperforms the existing software implementations in terms of performance and power consumption.Comment: in Proceedings of IEEE FCCM 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Author: Li Ang
Negrut Dan
Serban Radu
Publication venue
Publication date: 25/09/2015
Field of study

We discuss an approach for solving sparse or dense banded linear systems

{\bf A} {\bf x} = {\bf b}

on a Graphics Processing Unit (GPU) card. The matrix

{\bf A} \in {\mathbb{R}}^{N \times N}

is possibly nonsymmetric and moderately large; i.e.,

10000 \leq N \leq 500000

. The ${\it split\ and\ parallelize}

(

{\tt SaP}

) approach seeks to partition the matrix

{\bf A}

into diagonal sub-blocks

{\bf A}_i

,

i=1,\ldots,P

, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks

{\bf A}_i

. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called

{\tt SaP::GPU}

, which is compared in terms of efficiency with three commonly used sparse direct solvers:

{\tt PARDISO}

,

{\tt SuperLU}

, and

{\tt MUMPS}

.

{\tt SaP::GPU}

, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's

{\tt MKL}

,

{\tt SaP::GPU}

also fares well when used to solve dense banded systems that are close to being diagonally dominant.

{\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.Comment: 38 page

arXiv.org e-Print Archive

CiteSeerX

Alternating-Direction Line-Relaxation Methods on Multicomputers

Author: Hofhaus Jörn
Van de Velde Eric
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/1996
Field of study

We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison

Caltech Authors

Publikationsserver der RWTH Aachen University