Search CORE

1,620 research outputs found

On Algorithms Based on Joint Estimation of Currents and Contrast in Microwave Tomography

Author: Barrière Paul-André
Goussard Yves
Idier Jérôme
Laurin Jean-Jacques
Publication venue
Publication date: 01/01/2009
Field of study

This paper deals with improvements to the contrast source inversion method which is widely used in microwave tomography. First, the method is reviewed and weaknesses of both the criterion form and the optimization strategy are underlined. Then, two new algorithms are proposed. Both of them are based on the same criterion, similar but more robust than the one used in contrast source inversion. The first technique keeps the main characteristics of the contrast source inversion optimization scheme but is based on a better exploitation of the conjugate gradient algorithm. The second technique is based on a preconditioned conjugate gradient algorithm and performs simultaneous updates of sets of unknowns that are normally processed sequentially. Both techniques are shown to be more efficient than original contrast source inversion.Comment: 12 pages, 12 figures, 5 table

arXiv.org e-Print Archive

PolyPublie

Preconditioned conjugate-gradient methods for low-speed flow calculations

Author: Ajmani Kumud
Liou Meng-Sing
Ng Wing-Fai
Publication venue
Publication date
Field of study

An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations is integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the Lower-Upper Successive Symmetric Over-Relaxation iterative scheme is more efficient than a preconditioner based on Incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional Line Gauss-Seidel Relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver

NASA Technical Reports Server

A generalized Poisson and Poisson-Boltzmann solver for electrostatic environments

Author: G. Fisicaro
L. Genovese
N. Marzari
O. Andreussi
Pauling L.
Reichardt C.
S. Goedecker
Stern O.
Publication venue: 'AIP Publishing'
Publication date: 02/09/2015
Field of study

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of applied electrochemical potentials, taking into account the non-trivial electrostatic screening coming from the solvent and the electrolytes. As a consequence the electrostatic potential has to be found by solving the generalized Poisson and the Poisson-Boltzmann equation for neutral and ionic solutions, respectively. In the present work solvers for both problems have been developed. A preconditioned conjugate gradient method has been implemented to the generalized Poisson equation and the linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations of a ordinary Poisson equation solver. In addition, a self-consistent procedure enables us to solve the non-linear Poisson-Boltzmann problem. Both solvers exhibit very high accuracy and parallel efficiency, and allow for the treatment of different boundary conditions, as for example surface systems. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and will be released as an independent program, suitable for integration in other codes

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Hal - Université Grenoble Alpes

edoc

HAL-CEA

Partitioning strategy for efficient nonlinear finite element dynamic analysis on multiprocessor computers

Author: Noor Ahmed K.
Peters Jeanne M.
Publication venue
Publication date
Field of study

A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers

NASA Technical Reports Server

Computation of Ground States of the Gross-Pitaevskii Functional via Riemannian Optimization

Author: Danaila Ionut
Protas Bartosz
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2017
Field of study

In this paper we combine concepts from Riemannian Optimization and the theory of Sobolev gradients to derive a new conjugate gradient method for direct minimization of the Gross-Pitaevskii energy functional with rotation. The conservation of the number of particles constrains the minimizers to lie on a manifold corresponding to the unit

L^2

norm. The idea developed here is to transform the original constrained optimization problem to an unconstrained problem on this (spherical) Riemannian manifold, so that fast minimization algorithms can be applied as alternatives to more standard constrained formulations. First, we obtain Sobolev gradients using an equivalent definition of an

H^1

inner product which takes into account rotation. Then, the Riemannian gradient (RG) steepest descent method is derived based on projected gradients and retraction of an intermediate solution back to the constraint manifold. Finally, we use the concept of the Riemannian vector transport to propose a Riemannian conjugate gradient (RCG) method for this problem. It is derived at the continuous level based on the "optimize-then-discretize" paradigm instead of the usual "discretize-then-optimize" approach, as this ensures robustness of the method when adaptive mesh refinement is performed in computations. We evaluate various design choices inherent in the formulation of the method and conclude with recommendations concerning selection of the best options. Numerical tests demonstrate that the proposed RCG method outperforms the simple gradient descent (RG) method in terms of rate of convergence. While on simple problems a Newton-type method implemented in the {\tt Ipopt} library exhibits a faster convergence than the (RCG) approach, the two methods perform similarly on more complex problems requiring the use of mesh adaptation. At the same time the (RCG) approach has far fewer tunable parameters.Comment: 28 pages, 13 figure

arXiv.org e-Print Archive

HAL - Normandie Université

HAL Descartes

Hal-Diderot

New Algebraic Formulation of Density Functional Calculation

Author: Arias T. A.
Ismail-Beigi Sohrab
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

This article addresses a fundamental problem faced by the ab initio community: the lack of an effective formalism for the rapid exploration and exchange of new methods. To rectify this, we introduce a novel, basis-set independent, matrix-based formulation of generalized density functional theories which reduces the development, implementation, and dissemination of new ab initio techniques to the derivation and transcription of a few lines of algebra. This new framework enables us to concisely demystify the inner workings of fully functional, highly efficient modern ab initio codes and to give complete instructions for the construction of such for calculations employing arbitrary basis sets. Within this framework, we also discuss in full detail a variety of leading-edge ab initio techniques, minimization algorithms, and highly efficient computational kernels for use with scalar as well as shared and distributed-memory supercomputer architectures

arXiv.org e-Print Archive

CiteSeerX

Crossref

Communication reduction techniques in numerical methods and deep neural networks

Author: Zhuang Sicong
Publication venue: Universitat Politècnica de Catalunya
Publication date: 28/11/2019
Field of study

Inter-node communication has turned out to be one of the determining factors of the performance on modern HPC systems. Furthermore, the situation only gets worse with the ever-incresing size of the cores involved. Hence, this thesis explore the various possible techniques to reduce the communication during the execution of a parallel program. It turned out that there is no one-size-fit-all approach to the challenge. Despite this, the problems in each field, due to their unique characteristics, dispose of distinct opportunities for the communication reduction. The thesis, first devles into numerical linear algebra, develops an evolution of the Pipelined CG called IFCG. It eliminates the synchronizations normally take place towards the end of each iteration to increase the parallelism. Secondly, the thesis draws its attention on reducing the necessity to transfer the parameters between the CPU host and GPUs during a neural network training. It develops two routines: ADT and AWP in order to compress and decompress the weights with a reduced data representation format prior and right after the data transfer takes place. The compress rate is adjusted vis-à-vis the L2-norm of the weights of every layer. In the third contribution, the thesis diminish the communication in model parallelizing a deep neural network. Instead of splitting and distributing the neurons of each layer to the available processes on the system, now it is done every other layers. This results in a 50% percent reduction of the communication whereas it introduces 50% of extra local FP computation.La comunicació entre els nodes de computació multi-core sorgeix com un dels factors principals que impacta el rendiment d’un sistema HPC d’avui en dia. I més, mentre més core es pusa, pitjor la situació. Per tant aquesta tesi explora les possibles tècniques per a reduir la comunicació en l’execució d’un programa paral·lel. Tot i això, resulta que no existeix una sola tècnica que pugui resoldre aquest obstacle. Tot i que els problemes en cada àmbit, com que té els seus propis caracristics, disposa variosos oportunitats per la reducció de comunicació. La tesi, en primer lloc, dins de l’àmbit de l’àlgebra lineal numèriques desenvolupa un algoritme IFCG que és una evolució de Pipelined CG. IFCG elimina les sincronitzacions normalment posa cap al final de cada iteració per augmentar el paral·lelisme. En la segona contribució, la tesi dirigeix l’atenció a reduir la necessitat de transferir els paràmetres entre el CPU i els GPUs durant l’entrenament d’una xarxa neuronal. Desenvolupa rutines ADT i AWP per comprimir i descomprimir els pesos amb una representació de dades reduïda abans i just desprès de la transferència. La representació es decideix dinàmicament segons el L2-norm dels pesos a cada capa. Al final la tesi disminueix la comunicació en paral·lelitzar el model duna xarxa neurona. En lloc de distribuir les neurones de cada capa als processos disponibles en el sistema, es fa cada dues capes. Així que corta com mitja de la comunicació. En canvi, com que distribueix només cada dues capes, les capes restes es repliquen, resulta que incorre en una augmenta de 50% de computació local.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC