Search CORE

2,451 research outputs found

A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver

Author: Geng Weihua
Jacob Ferosh
Publication venue: 'Elsevier BV'
Publication date: 24/01/2013
Field of study

In this paper, we present a GPU-accelerated direct-sum boundary integral method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a well-posed boundary integral formulation is used to ensure the fast convergence of Krylov subspace based linear algebraic solver such as the GMRES. The molecular surfaces are discretized with flat triangles and centroid collocation. To speed up our method, we take advantage of the parallel nature of the boundary integral formulation and parallelize the schemes within CUDA shared memory architecture on GPU. The schemes use only

11N+6N_c

size-of-double device memory for a biomolecule with

N

triangular surface elements and

N_c

partial charges. Numerical tests of these schemes show well-maintained accuracy and fast convergence. The GPU implementation using one GPU card (Nvidia Tesla M2070) achieves 120-150X speed-up to the implementation using one CPU (Intel L5640 2.27GHz). With our approach, solving PB equations on well-discretized molecular surfaces with up to 300,000 boundary elements will take less than about 10 minutes, hence our approach is particularly suitable for fast electrostatics computations on small to medium biomolecules

arXiv.org e-Print Archive

CiteSeerX

High performance interior point methods for three-dimensional finite element limit analysis

Author: Lyamin Andrei V.
Podlich Nathan
Sloan Scott W.
Publication venue: CIMNE
Publication date: 01/01/2019
Field of study

The ability to obtain rigorous upper and lower bounds on collapse loads of various structures makes ﬁnite element limit analysis an attractive design tool. The increasingly high cost of computing those bounds, however, has limited its application on problems in three dimensions. This work reports on a high-performance homogeneous self-dual primal-dual interior point method developed for three-dimensional ﬁnite element limit analysis. This implementation achieves convergence times over 4.5× faster than the leading commercial solver across a set of three-dimensional ﬁnite element limit analysis test problems, making investigation of three dimensional limit loads viable. A comparison between a range of iterative linear solvers and direct methods used to determine the search direction is also provided, demonstrating the superiority of direct methods for this application. The components of the interior point solver considered include the elimination of and options for handling remaining free variables, multifrontal and supernodal Cholesky comparison for computing the search direction, diﬀerences between approximate minimum degree [1] and nested dissection [13] orderings, dealing with dense columns and ﬁxed variables, and accelerating the linear system solver through parallelization. Each of these areas resulted in an improvement on at least one of the problems in the test set, with many achieving gains across the whole set. The serial implementation achieved runtime performance 1.7× faster than the commercial solver Mosek [5]. Compared with the parallel version of Mosek, the use of parallel BLAS routines in the supernodal solver saw a 1.9× speedup, and with a modiﬁed version of the GPU-enabled CHOLMOD [11] and a single NVIDIA Tesla K20c this speedup increased to 4.65×

UPCommons. Portal del coneixement obert de la UPC

Greedy low-rank algorithm for spatial connectome regression

Author: Benner Peter
Dolgov Sergey
Harris Kameron Decker
Kürschner Patrick
Publication venue
Publication date: 01/01/2019
Field of study

Recovering brain connectivity from tract tracing data is an important computational problem in the neurosciences. Mesoscopic connectome reconstruction was previously formulated as a structured matrix regression problem (Harris et al., 2016), but existing techniques do not scale to the whole-brain setting. The corresponding matrix equation is challenging to solve due to large scale, ill-conditioning, and a general form that lacks a convergent splitting. We propose a greedy low-rank algorithm for connectome reconstruction problem in very high dimensions. The algorithm approximates the solution by a sequence of rank-one updates which exploit the sparse and positive definite problem structure. This algorithm was described previously (Kressner and Sirkovi\'c, 2015) but never implemented for this connectome problem, leading to a number of challenges. We have had to design judicious stopping criteria and employ efficient solvers for the three main sub-problems of the algorithm, including an efficient GPU implementation that alleviates the main bottleneck for large datasets. The performance of the method is evaluated on three examples: an artificial "toy" dataset and two whole-cortex instances using data from the Allen Mouse Brain Connectivity Atlas. We find that the method is significantly faster than previous methods and that moderate ranks offer good approximation. This speedup allows for the estimation of increasingly large-scale connectomes across taxa as these data become available from tracing experiments. The data and code are available online

arXiv.org e-Print Archive

MPG.PuRe

Fixing Nonconvergence of Algebraic Iterative Reconstruction with an Unmatched Backprojector

Author: Dong Yiqiu
Hansen Per Christian
Hochstenbach Michiel E.
Riis Nicolai Andre Brogaard
Publication venue
Publication date: 01/01/2019
Field of study

We consider algebraic iterative reconstruction methods with applications in image reconstruction. In particular, we are concerned with methods based on an unmatched projector/backprojector pair; i.e., the backprojector is not the exact adjoint or transpose of the forward projector. Such situations are common in large-scale computed tomography, and we consider the common situation where the method does not converge due to the nonsymmetry of the iteration matrix. We propose a modified algorithm that incorporates a small shift parameter, and we give the conditions that guarantee convergence of this method to a fixed point of a slightly perturbed problem. We also give perturbation bounds for this fixed point. Moreover, we discuss how to use Krylov subspace methods to efficiently estimate the leftmost eigenvalue of a certain matrix to select a proper shift parameter. The modified algorithm is illustrated with test problems from computed tomography

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Online Research Database In Technology

Tensor Computation: A New Framework for High-Dimensional Problems in EDA

Author: Batselier Kim
Daniel Luca
Liu Haotian
Wong Ngai
Zhang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2016
Field of study

Many critical EDA problems suffer from the curse of dimensionality, i.e. the very fast-scaling computational burden produced by large number of parameters and/or unknown variables. This phenomenon may be caused by multiple spatial or temporal factors (e.g. 3-D field solvers discretizations and multi-rate circuit simulation), nonlinearity of devices and circuits, large number of design or optimization parameters (e.g. full-chip routing/placement and circuit sizing), or extensive process variations (e.g. variability/reliability analysis and design for manufacturability). The computational challenges generated by such high dimensional problems are generally hard to handle efficiently with traditional EDA core algorithms that are based on matrix and vector computation. This paper presents "tensor computation" as an alternative general framework for the development of efficient EDA algorithms and tools. A tensor is a high-dimensional generalization of a matrix and a vector, and is a natural choice for both storing and solving efficiently high-dimensional EDA problems. This paper gives a basic tutorial on tensors, demonstrates some recent examples of EDA applications (e.g., nonlinear circuit modeling and high-dimensional uncertainty quantification), and suggests further open EDA problems where the use of tensor computation could be of advantage.Comment: 14 figures. Accepted by IEEE Trans. CAD of Integrated Circuits and System

arXiv.org e-Print Archive

DSpace@MIT

Crossref

HKU Scholars Hub

Recommended from our members

Schnelle Löser für partielle Differentialgleichungen

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2008
Field of study

[no abstract available

Repositorium für Naturwissenschaften und Technik

High-performance image reconstruction in fluorescence tomography on desktop computers and graphics hardware

Author: Alexandrakis
Arridge
Arridge
Bakushinsky
Dogdas
Egger
Freiberger
Gannot
Godavarty
Göddeke
Hanke
Jiang
Joshi
Kaltenbacher
Keijzer
Landsman
Marquardt
Mordon
Morozov
Prakash
Roy
Schweiger
Schöberl
Sevick
Shives
Zhang
Publication venue: Optical Society of America
Publication date
Field of study

Image reconstruction in fluorescence optical tomography is a three-dimensional nonlinear ill-posed problem governed by a system of partial differential equations. In this paper we demonstrate that a combination of state of the art numerical algorithms and a careful hardware optimized implementation allows to solve this large-scale inverse problem in a few seconds on standard desktop PCs with modern graphics hardware. In particular, we present methods to solve not only the forward but also the non-linear inverse problem by massively parallel programming on graphics processors. A comparison of optimized CPU and GPU implementations shows that the reconstruction can be accelerated by factors of about 15 through the use of the graphics hardware without compromising the accuracy in the reconstructed images

Crossref

PubMed Central