59,127 research outputs found
Automatic Energy Saving Schemes for Parallel Applications
Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale.
Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and
frequency scaling (DVFS) and CPU clock modulation (throttling),
the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to
maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced.
This work first studies two important collective communication operations, all-to-all and allgather and proposes energy saving strategies on the per-call basis. Next, it targets point-to-point communications to group them into phases and apply frequency scaling to them to save energy by exploiting the architectural and communication stalls. Finally, it proposes an automatic runtime system which combines both collective and point-to-point communications into phases, and applies throttling to them apart from DVFS to maximize energy savings. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. Close to the maximum energy savings were obtained with a substantially low performance loss on the given platform
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
Energy efficiency is becoming increasingly important for computing systems,
in particular for large scale HPC facilities. In this work we evaluate, from an
user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS)
techniques, assisted by the power and energy monitoring capabilities of modern
processors in order to tune applications for energy efficiency. We run selected
kernels and a full HPC application on two high-end processors widely used in
the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate
the available trade-offs between energy-to-solution and time-to-solution,
attempting a function-by-function frequency tuning. We finally estimate the
benefits obtainable running the full code on a HPC multi-GPU node, with respect
to default clock frequency governors. We instrument our code to accurately
monitor power consumption and execution time without the need of any additional
hardware, and we enable it to change CPUs and GPUs clock frequencies while
running. We analyze our results on the different architectures using a simple
energy-performance model, and derive a number of energy saving strategies which
can be easily adopted on recent high-end HPC systems for generic applications
Parallel Implementation of Lossy Data Compression for Temporal Data Sets
Many scientific data sets contain temporal dimensions. These are the data
storing information at the same spatial location but different time stamps.
Some of the biggest temporal datasets are produced by parallel computing
applications such as simulations of climate change and fluid dynamics. Temporal
datasets can be very large and cost a huge amount of time to transfer among
storage locations. Using data compression techniques, files can be transferred
faster and save storage space. NUMARCK is a lossy data compression algorithm
for temporal data sets that can learn emerging distributions of element-wise
change ratios along the temporal dimension and encodes them into an index table
to be concisely represented. This paper presents a parallel implementation of
NUMARCK. Evaluated with six data sets obtained from climate and astrophysics
simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when
running 12800 MPI processes on a parallel computer. We also compare the
compression ratios against two lossy data compression algorithms, ISABELA and
ZFP. The results show that NUMARCK achieved higher compression ratio than
ISABELA and ZFP.Comment: 10 pages, HiPC 201
Generalizing Amdahl’s Law for Power and Energy
Extending Amdahl\u27s law to identify optimal power-performance configurations requires considering the interactive effects of power, performance, and parallel overhead
Energy-efficient wireless communication
In this chapter we present an energy-efficient highly adaptive network interface architecture and a novel data link layer protocol for wireless networks that provides Quality of Service (QoS) support for diverse traffic types. Due to the dynamic nature of wireless networks, adaptations in bandwidth scheduling and error control are necessary to achieve energy efficiency and an acceptable quality of service. In our approach we apply adaptability through all layers of the protocol stack, and provide feedback to the applications. In this way the applications can adapt the data streams, and the network protocols can adapt the communication parameters
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Exploring Energy Saving Opportunities in Fault Tolerant HPC Systems
Nowadays, improving the energy efficiency of high-performance computing (HPC)
systems is one of the main drivers in scientific and technological research. As
large-scale HPC systems require some fault-tolerant method, the opportunities
to reduce energy consumption should be explored. In particular,
rollback-recovery methods using uncoordinated checkpoints prevent all processes
from re-executing when a failure occurs. In this context, it is possible to
take actions to reduce the energy consumption of the nodes whose processes do
not re-execute. This work is an extension of a previous one, in which we
proposed a series of strategies to manage energy consumption at failure-time.
In this work, we have enriched our simulator and the experimentation by
including non-blocking communications (with and without system buffering) and a
largest number of candidate processes to be analyzed. We have called the latter
as \textit{cascade analysis}, because it includes processes that gets blocked
by communication indirectly with the failed process. The simulations show that
the savings were negligible in the worst case, but in some scenarios, it was
possible to achieve significant ones; the maximum saving achieved was 90\% in a
time interval of 16 minutes. As a result, we show the feasibility of improving
energy efficiency in HPC systems in the presence of a failure.Comment: This is the accepted version of the manuscript that was sent to
review to Journal of Parallel and Distributed Computing (ISSN 1096-0848).
arXiv admin note: text overlap with arXiv:2012.1139
- …