3,346 research outputs found
Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters
Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version
On Evaluating Commercial Cloud Services: A Systematic Review
Background: Cloud Computing is increasingly booming in industry with many
competing providers and services. Accordingly, evaluation of commercial Cloud
services is necessary. However, the existing evaluation studies are relatively
chaotic. There exists tremendous confusion and gap between practices and theory
about Cloud services evaluation. Aim: To facilitate relieving the
aforementioned chaos, this work aims to synthesize the existing evaluation
implementations to outline the state-of-the-practice and also identify research
opportunities in Cloud services evaluation. Method: Based on a conceptual
evaluation model comprising six steps, the Systematic Literature Review (SLR)
method was employed to collect relevant evidence to investigate the Cloud
services evaluation step by step. Results: This SLR identified 82 relevant
evaluation studies. The overall data collected from these studies essentially
represent the current practical landscape of implementing Cloud services
evaluation, and in turn can be reused to facilitate future evaluation work.
Conclusions: Evaluation of commercial Cloud services has become a world-wide
research topic. Some of the findings of this SLR identify several research gaps
in the area of Cloud services evaluation (e.g., the Elasticity and Security
evaluation of commercial Cloud services could be a long-term challenge), while
some other findings suggest the trend of applying commercial Cloud services
(e.g., compared with PaaS, IaaS seems more suitable for customers and is
particularly important in industry). This SLR study itself also confirms some
previous experiences and reveals new Evidence-Based Software Engineering (EBSE)
lessons
Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations
Asynchronous programming models (APM) are gaining more and more traction,
allowing applications to expose the available concurrency to a runtime system
tasked with coordinating the execution. While MPI has long provided support for
multi-threaded communication and non-blocking operations, it falls short of
adequately supporting APMs as correctly and efficiently handling MPI
communication in different models is still a challenge. Meanwhile, new
low-level implementations of light-weight, cooperatively scheduled execution
contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis
for higher-level APMs and their integration in MPI implementations has been
proposed as a replacement for traditional POSIX thread support to alleviate
these challenges.
In this paper, we first establish a taxonomy in an attempt to clearly
distinguish different concepts in the parallel software stack. We argue that
the proposed tight integration of fiber implementations with MPI is neither
warranted nor beneficial and instead is detrimental to the goal of MPI being a
portable communication abstraction. We propose MPI Continuations as an
extension to the MPI standard to provide callback-based notifications on
completed operations, leading to a clear separation of concerns by providing a
loose coupling mechanism between MPI and APMs. We show that this interface is
flexible and interacts well with different APMs, namely OpenMP detached tasks,
OmpSs-2, and Argobots.Comment: 12 pages, 7 figures Published in proceedings of EuroMPI/USA '20,
September 21-24, 2020, Austin, TX, US
Single-Board-Computer Clusters for Cloudlet Computing in Internet of Things
The number of connected sensors and devices is expected to increase to billions in the near
future. However, centralised cloud-computing data centres present various challenges to meet the
requirements inherent to Internet of Things (IoT) workloads, such as low latency, high throughput
and bandwidth constraints. Edge computing is becoming the standard computing paradigm for
latency-sensitive real-time IoT workloads, since it addresses the aforementioned limitations related
to centralised cloud-computing models. Such a paradigm relies on bringing computation close to
the source of data, which presents serious operational challenges for large-scale cloud-computing
providers. In this work, we present an architecture composed of low-cost Single-Board-Computer
clusters near to data sources, and centralised cloud-computing data centres. The proposed
cost-efficient model may be employed as an alternative to fog computing to meet real-time IoT
workload requirements while keeping scalability. We include an extensive empirical analysis to
assess the suitability of single-board-computer clusters as cost-effective edge-computing micro data
centres. Additionally, we compare the proposed architecture with traditional cloudlet and cloud
architectures, and evaluate them through extensive simulation. We finally show that acquisition costs
can be drastically reduced while keeping performance levels in data-intensive IoT use cases.Ministerio de Economía y Competitividad TIN2017-82113-C2-1-RMinisterio de Economía y Competitividad RTI2018-098062-A-I00European Union’s Horizon 2020 No. 754489Science Foundation Ireland grant 13/RC/209
Performance Evaluation of Data-Intensive Computing Applications on a Public IaaS Cloud
[Abstract] The advent of cloud computing technologies, which dynamically provide on-demand access to computational resources over the Internet, is offering new possibilities to many scientists and researchers. Nowadays, Infrastructure as a Service (IaaS) cloud providers can offset the increasing processing requirements of data-intensive computing applications, becoming an emerging alternative to traditional servers and clusters. In this paper, a comprehensive study of the leading public IaaS cloud platform, Amazon EC2, has been conducted in order to assess its suitability for data-intensive computing. One of the key contributions of this work is the analysis of the storage-optimized family of EC2 instances. Furthermore, this study presents a detailed analysis of both performance and cost metrics. More specifically, multiple experiments have been carried out to analyze the full I/O software stack, ranging from the low-level storage devices and cluster file systems up to real-world applications using representative data-intensive parallel codes and MapReduce-based workloads. The analysis of the experimental results has shown that data-intensive applications can benefit from tailored EC2-based virtual clusters, enabling users to obtain the highest performance and cost-effectiveness in the cloud.Ministerio de Economía y Competitividad; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/055Ministerio de Educación y Ciencia; AP2010-434
A Global Optimisation Toolbox for Massively Parallel Engineering Optimisation
A software platform for global optimisation, called PaGMO, has been developed
within the Advanced Concepts Team (ACT) at the European Space Agency, and was
recently released as an open-source project. PaGMO is built to tackle
high-dimensional global optimisation problems, and it has been successfully
used to find solutions to real-life engineering problems among which the
preliminary design of interplanetary spacecraft trajectories - both chemical
(including multiple flybys and deep-space maneuvers) and low-thrust (limited,
at the moment, to single phase trajectories), the inverse design of
nano-structured radiators and the design of non-reactive controllers for
planetary rovers. Featuring an arsenal of global and local optimisation
algorithms (including genetic algorithms, differential evolution, simulated
annealing, particle swarm optimisation, compass search, improved harmony
search, and various interfaces to libraries for local optimisation such as
SNOPT, IPOPT, GSL and NLopt), PaGMO is at its core a C++ library which employs
an object-oriented architecture providing a clean and easily-extensible
optimisation framework. Adoption of multi-threaded programming ensures the
efficient exploitation of modern multi-core architectures and allows for a
straightforward implementation of the island model paradigm, in which multiple
populations of candidate solutions asynchronously exchange information in order
to speed-up and improve the optimisation process. In addition to the C++
interface, PaGMO's capabilities are exposed to the high-level language Python,
so that it is possible to easily use PaGMO in an interactive session and take
advantage of the numerous scientific Python libraries available.Comment: To be presented at 'ICATT 2010: International Conference on
Astrodynamics Tools and Techniques
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
- …