Search CORE

7,868 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Analytical/ML Mixed Approach for Concurrency Regulation in Software Transactional Memory

Author: Ciciani Bruno
DI SANZO Pierangelo
Quaglia Francesco
Rughetti Diego
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

In this article we exploit a combination of analytical and Machine Learning (ML) techniques in order to build a performance model allowing to dynamically tune the level of concurrency of applications based on Software Transactional Memory (STM). Our mixed approach has the advantage of reducing the training time of pure machine learning methods, and avoiding approximation errors typically affecting pure analytical approaches. Hence it allows very fast construction of highly reliable performance models, which can be promptly and effectively exploited for optimizing actual application runs. We also present a real implementation of a concurrency regulation architecture, based on the mixed modeling approach, which has been integrated with the open source Tiny STM package, together with experimental data related to runs of applications taken from the STAMP benchmark suite demonstrating the effectiveness of our proposal. © 2014 IEEE

CiteSeerX

Crossref

Archivio della Ricerca - Università di Roma 3

Archivio della ricerca- Università di Roma La Sapienza

DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

Author: Bethel E. Wes
Camp David
Childs Hank
Heinemann Colleen
Lessley Brenton
Perciano Talita
Publication venue
Publication date: 13/09/2018
Field of study

We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).Comment: LDAV 2018, October 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Multi-threaded Output in CMS using ROOT

Author: Jones Christopher
Riley Daniel
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

CMS has worked aggressively to make use of multi-core architectures, routinely running 4- to 8-core production jobs in 2017. The primary impediment to efficiently scaling beyond 8 cores has been our ROOT-based output module, which has been necessarily single threaded. In this paper we explore the changes made to the CMS framework and our ROOT output module to overcome the previous scaling limits, using two new ROOT features: the \texttt{TBufferMerger} asynchronous file merger, and Implicit Multi-Threading. We examine the architecture of the new parallel output module, the specific accommodations and modifications that were made to ensure compatibility with the CMS framework scheduler, and the performance characteristics of the new output module.Comment: Submitted to CHEP 2018 - 23rd International Conference on Computing in High Energy and Nuclear Physics; 6 pages, 4 figures, uses webofc clas

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals