Search CORE

14,477 research outputs found

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

Workload Equity in Vehicle Routing Problems: A Survey and Analysis

Author: Hartl Richard F.
Matl Piotr
Vidal Thibaut
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/01/2018
Field of study

Over the past two decades, equity aspects have been considered in a growing number of models and methods for vehicle routing problems (VRPs). Equity concerns most often relate to fairly allocating workloads and to balancing the utilization of resources, and many practical applications have been reported in the literature. However, there has been only limited discussion about how workload equity should be modeled in VRPs, and various measures for optimizing such objectives have been proposed and implemented without a critical evaluation of their respective merits and consequences. This article addresses this gap with an analysis of classical and alternative equity functions for biobjective VRP models. In our survey, we review and categorize the existing literature on equitable VRPs. In the analysis, we identify a set of axiomatic properties that an ideal equity measure should satisfy, collect six common measures, and point out important connections between their properties and those of the resulting Pareto-optimal solutions. To gauge the extent of these implications, we also conduct a numerical study on small biobjective VRP instances solvable to optimality. Our study reveals two undesirable consequences when optimizing equity with nonmonotonic functions: Pareto-optimal solutions can consist of non-TSP-optimal tours, and even if all tours are TSP optimal, Pareto-optimal solutions can be workload inconsistent, i.e. composed of tours whose workloads are all equal to or longer than those of other Pareto-optimal solutions. We show that the extent of these phenomena should not be underestimated. The results of our biobjective analysis are valid also for weighted sum, constraint-based, or single-objective models. Based on this analysis, we conclude that monotonic equity functions are more appropriate for certain types of VRP models, and suggest promising avenues for further research.Comment: Accepted Manuscrip

arXiv.org e-Print Archive

Crossref

PolyPublie

Achieving Efficient Strong Scaling with PETSc using Hybrid MPI/OpenMP Optimisation

Author: G. Goumas
G. Schubert
G. Wellein
M. Butler
M.D. Piggott
N. Bell
P. Balaji
S. Williams
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The increasing number of processing elements and decreas- ing memory to core ratio in modern high-performance platforms makes efficient strong scaling a key requirement for numerical algorithms. In order to achieve efficient scalability on massively parallel systems scientific software must evolve across the entire stack to exploit the multiple levels of parallelism exposed in modern architectures. In this paper we demonstrate the use of hybrid MPI/OpenMP parallelisation to optimise parallel sparse matrix-vector multiplication in PETSc, a widely used scientific library for the scalable solution of partial differential equations. Using large matrices generated by Fluidity, an open source CFD application code which uses PETSc as its linear solver engine, we evaluate the effect of explicit communication overlap using task-based parallelism and show how to further improve performance by explicitly load balancing threads within MPI processes. We demonstrate a significant speedup over the pure-MPI mode and efficient strong scaling of sparse matrix-vector multiplication on Fujitsu PRIMEHPC FX10 and Cray XE6 systems

arXiv.org e-Print Archive

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Scalable Layer-2/Layer-3 Multistage Switching Architectures for Software Routers

Author: Bianco Andrea
Finochietto J
Galante G
Mazzucchi D
Mellia Marco
Neri Fabio
Publication venue: IEEE
Publication date: 01/01/2006
Field of study

Software routers are becoming an important alternative to proprietary and expensive network devices, because they exploit the economy of scale of the PC market and open-source software. When considering maximum performance in terms of throughput, PC-based routers suffer from limitations stemming from the single PC architecture, e.g., limited bus bandwidth, and high memory access latency. To overcome these limitations, in this paper we present a multistage architecture that combines a layer-2 load-balancer front-end and a layer-3 routing back-end, interconnected by standard Ethernet switches. Both the front-end and the back-end are implemented using standard PCs and open- source software. After describing the architecture, evaluation is performed on a lab test-bed, to show its scalability. While the proposed solution allows to increase performance of PC- based routers, it also allows to distribute packet manipulation functionalities, and to automatically recover from component failures

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Multistage Switching Architectures for Software Routers

Author: Andrea Bianco
Fabio Neri
Giulio Galante
Istituto Superiore
Jorge M. Finochietto
Marco Mellia
Mario Boella
Politecnico Di Torino
Publication venue: IEEE
Publication date: 01/01/2007
Field of study

Software routers based on personal computer (PC) architectures are becoming an important alternative to proprietary and expensive network devices. However, software routers suffer from many limitations of the PC architecture, including, among others, limited bus and central processing unit (CPU) bandwidth, high memory access latency, limited scalability in terms of number of network interface cards, and lack of resilience mechanisms. Multistage PC-based architectures can be an interesting alternative since they permit us to i) increase the performance of single software routers, ii) scale router size, iii) distribute packet manipulation and control functionality, iv) recover from single-component failures, and v) incrementally upgrade router performance. We propose a specific multistage architecture, exploiting PC-based routers as switching elements, to build a high-speed, largesize,scalable, and reliable software router. A small-scale prototype of the multistage router is currently up and running in our labs, and performance evaluation is under wa

CiteSeerX

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Automated problem scheduling and reduction of synchronization delay effects

Author: Saltz Joel H.
Publication venue
Publication date
Field of study

It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs

NASA Technical Reports Server

PPF - A Parallel Particle Filtering Library

Author: Demirel Ömer
Meijering Erik
Niessen Wiro
Sbalzarini Ivo F.
Smal Ihor
Publication venue
Publication date: 01/01/2014
Field of study

We present the parallel particle filtering (PPF) software library, which enables hybrid shared-memory/distributed-memory parallelization of particle filtering (PF) algorithms combining the Message Passing Interface (MPI) with multithreading for multi-level parallelism. The library is implemented in Java and relies on OpenMPI's Java bindings for inter-process communication. It includes dynamic load balancing, multi-thread balancing, and several algorithmic improvements for PF, such as input-space domain decomposition. The PPF library hides the difficulties of efficient parallel programming of PF algorithms and provides application developers with the necessary tools for parallel implementation of PF methods. We demonstrate the capabilities of the PPF library using two distributed PF algorithms in two scenarios with different numbers of particles. The PPF library runs a 38 million particle problem, corresponding to more than 1.86 GB of particle data, on 192 cores with 67% parallel efficiency. To the best of our knowledge, the PPF library is the first open-source software that offers a parallel framework for PF applications.Comment: 8 pages, 8 figures; will appear in the proceedings of the IET Data Fusion & Target Tracking Conference 201

arXiv.org e-Print Archive

EUR Research Repository

MPG.PuRe