22,110 research outputs found
Hyper Converged Infrastructures: Beyond virtualization
Hyper Convergence has brought virtualization and IT strategies to a new
level. Datacenters are undergoing a deep paradigm shift from a hardware-centric
to an application-centric approach which leverages on software defined
architectures, while IT is more and more being delivered as services rather
than assets or products. Throughout different evolving phases since the initial
attempts to convergence, the concept has been refined down to a level
where,ultimately, a whole datacenter could be fully managed from a centralized
single point, abstracting the whole hardware layer and exposing it to the
administrators as a transparent pool of resources. This paper analyzes the
evolution of infrastructures and tries to dig into the reality and convenience
of Hyper Convergence
Main memory in HPC: do we need more, or could we live with less?
An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with conventional Dual In-line Memory Modules (DIMMs), 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore, the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now.
This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High-Performance Conjugate Gradients (HPCG) benchmark could be an important success story for 3D-stacked memories in HPC, but High-Performance Linpack (HPL) is likely to be constrained by 3D memory capacity. The study also emphasizes that the analysis of memory footprints of production HPC applications is complex and that it requires an understanding of application scalability and target category, i.e., whether the users target capability or capacity computing. The results show that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but we also detect applications and use cases that require gigabytes per core. Overall, the study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step toward adoption of this novel technology in the HPC domain.This work was supported by the Collaboration Agreement between Samsung Electronics Co., Ltd. and BSC, Spanish Government through Severo Ochoa programme (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union’s Horizon
2020 research and innovation programme under ExaNoDe project (grant agreement No 671578). Darko Zivanovic holds the Severo Ochoa grant (SVP-2014-068501) of the Ministry of Economy and Competitiveness
of Spain. The authors thank Harald Servat from BSC and Vladimir Marjanovi´c from High Performance Computing Center Stuttgart for their technical support.Postprint (published version
Going through Rough Times: from Non-Equilibrium Surface Growth to Algorithmic Scalability
Efficient and faithful parallel simulation of large asynchronous systems is a
challenging computational problem. It requires using the concept of local
simulated times and a synchronization scheme. We study the scalability of
massively parallel algorithms for discrete-event simulations which employ
conservative synchronization to enforce causality. We do this by looking at the
simulated time horizon as a complex evolving system, and we identify its
universal characteristics. We find that the time horizon for the conservative
parallel discrete-event simulation scheme exhibits Kardar-Parisi-Zhang-like
kinetic roughening. This implies that the algorithm is asymptotically scalable
in the sense that the average progress rate of the simulation approaches a
non-zero constant. It also implies, however, that there are diverging memory
requirements associated with such schemes.Comment: to appear in the Proceedings of the MRS, Fall 200
A Parallel Solver for Graph Laplacians
Problems from graph drawing, spectral clustering, network flow and graph
partitioning can all be expressed in terms of graph Laplacian matrices. There
are a variety of practical approaches to solving these problems in serial.
However, as problem sizes increase and single core speeds stagnate, parallelism
is essential to solve such problems quickly. We present an unsmoothed
aggregation multigrid method for solving graph Laplacians in a distributed
memory setting. We introduce new parallel aggregation and low degree
elimination algorithms targeted specifically at irregular degree graphs. These
algorithms are expressed in terms of sparse matrix-vector products using
generalized sum and product operations. This formulation is amenable to linear
algebra using arbitrary distributions and allows us to operate on a 2D sparse
matrix distribution, which is necessary for parallel scalability. Our solver
outperforms the natural parallel extension of the current state of the art in
an algorithmic comparison. We demonstrate scalability to 576 processes and
graphs with up to 1.7 billion edges.Comment: PASC '18, Code: https://github.com/ligmg/ligm
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
- …