57,661 research outputs found
Adapting the interior point method for the solution of linear programs on high performance computers
In this paper we describe a unified algorithmic framework for the interior point method (IPM) of solving Linear Programs (LPs) which allows us to adapt it over a range of high performance computer architectures. We set out the reasons as to why IPM makes better use of high performance computer architecture than the sparse simplex method. In the inner iteration of the IPM a search direction is computed using Newton or higher order methods. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system and the design of data structures to take advantage of coarse grain parallel and massively parallel computer architectures are considered in detail. Finally, we present experimental results of solving NETLIB test problems on examples of these architectures and put forward arguments as to why integration of the system within sparse simplex is beneficial
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization
Bundle adjustment (BA) is a fundamental optimization technique used in many
crucial applications, including 3D scene reconstruction, robotic localization,
camera calibration, autonomous driving, space exploration, street view map
generation etc. Essentially, BA is a joint non-linear optimization problem, and
one which can consume a significant amount of time and power, especially for
large optimization problems. Previous approaches of optimizing BA performance
heavily rely on parallel processing or distributed computing, which trade
higher power consumption for higher performance. In this paper we propose
{\pi}-BA, the first hardware-software co-designed BA engine on an embedded
FPGA-SoC that exploits custom hardware for higher performance and power
efficiency. Specifically, based on our key observation that not all points
appear on all images in a BA problem, we designed and implemented a
Co-Observation Optimization technique to accelerate BA operations with
optimized usage of memory and computation resources. Experimental results
confirm that {\pi}-BA outperforms the existing software implementations in
terms of performance and power consumption.Comment: in Proceedings of IEEE FCCM 201
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards
We discuss an approach for solving sparse or dense banded linear systems
on a Graphics Processing Unit (GPU) card. The
matrix is possibly nonsymmetric and
moderately large; i.e., . The ${\it split\ and\
parallelize}{\tt SaP}{\bf A}{\bf A}_ii=1,\ldots,P{\bf A}_i{\tt SaP::GPU}{\tt PARDISO}{\tt SuperLU}{\tt MUMPS}{\tt SaP::GPU}{\tt MKL}{\tt SaP::GPU}{\tt SaP::GPU}$ is publicly available and distributed as
open source under a permissive BSD3 license.Comment: 38 page
Alternating-Direction Line-Relaxation Methods on Multicomputers
We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison
- …