7,868 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Analytical/ML Mixed Approach for Concurrency Regulation in Software Transactional Memory
In this article we exploit a combination of analytical and Machine Learning (ML) techniques in order to build a performance model allowing to dynamically tune the level of concurrency of applications based on Software Transactional Memory (STM). Our mixed approach has the advantage of reducing the training time of pure machine learning methods, and avoiding approximation errors typically affecting pure analytical approaches. Hence it allows very fast construction of highly reliable performance models, which can be promptly and effectively exploited for optimizing actual application runs. We also present a real implementation of a concurrency regulation architecture, based on the mixed modeling approach, which has been integrated with the open source Tiny STM package, together with experimental data related to runs of applications taken from the STAMP benchmark suite demonstrating the effectiveness of our proposal. © 2014 IEEE
DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives
We present a new parallel algorithm for probabilistic graphical model
optimization. The algorithm relies on data-parallel primitives (DPPs), which
provide portable performance over hardware architecture. We evaluate results on
CPUs and GPUs for an image segmentation problem. Compared to a serial baseline,
we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare
our performance to a reference, OpenMP-based algorithm, and find speedups of up
to 7X (CPU).Comment: LDAV 2018, October 201
Multi-threaded Output in CMS using ROOT
CMS has worked aggressively to make use of multi-core architectures,
routinely running 4- to 8-core production jobs in 2017. The primary impediment
to efficiently scaling beyond 8 cores has been our ROOT-based output module,
which has been necessarily single threaded. In this paper we explore the
changes made to the CMS framework and our ROOT output module to overcome the
previous scaling limits, using two new ROOT features: the
\texttt{TBufferMerger} asynchronous file merger, and Implicit Multi-Threading.
We examine the architecture of the new parallel output module, the specific
accommodations and modifications that were made to ensure compatibility with
the CMS framework scheduler, and the performance characteristics of the new
output module.Comment: Submitted to CHEP 2018 - 23rd International Conference on Computing
in High Energy and Nuclear Physics; 6 pages, 4 figures, uses webofc clas
- …