6,006 research outputs found
Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes
This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), by
the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund
programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). This work was also partially performed
under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-689878).
Finally, the authors are grateful to the reviewers for their valuable comments, to the RoMoL team, to Xavier Teruel and Kallia Chronaki from the Programming Models group
of BSC and the Computation Department of LLNL for their technical support and useful feedback.Peer ReviewedPostprint (published version
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
- …