3,346 research outputs found

    Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters

    Get PDF
    Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version

    On Evaluating Commercial Cloud Services: A Systematic Review

    Full text link
    Background: Cloud Computing is increasingly booming in industry with many competing providers and services. Accordingly, evaluation of commercial Cloud services is necessary. However, the existing evaluation studies are relatively chaotic. There exists tremendous confusion and gap between practices and theory about Cloud services evaluation. Aim: To facilitate relieving the aforementioned chaos, this work aims to synthesize the existing evaluation implementations to outline the state-of-the-practice and also identify research opportunities in Cloud services evaluation. Method: Based on a conceptual evaluation model comprising six steps, the Systematic Literature Review (SLR) method was employed to collect relevant evidence to investigate the Cloud services evaluation step by step. Results: This SLR identified 82 relevant evaluation studies. The overall data collected from these studies essentially represent the current practical landscape of implementing Cloud services evaluation, and in turn can be reused to facilitate future evaluation work. Conclusions: Evaluation of commercial Cloud services has become a world-wide research topic. Some of the findings of this SLR identify several research gaps in the area of Cloud services evaluation (e.g., the Elasticity and Security evaluation of commercial Cloud services could be a long-term challenge), while some other findings suggest the trend of applying commercial Cloud services (e.g., compared with PaaS, IaaS seems more suitable for customers and is particularly important in industry). This SLR study itself also confirms some previous experiences and reveals new Evidence-Based Software Engineering (EBSE) lessons

    Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

    Full text link
    Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for multi-threaded communication and non-blocking operations, it falls short of adequately supporting APMs as correctly and efficiently handling MPI communication in different models is still a challenge. Meanwhile, new low-level implementations of light-weight, cooperatively scheduled execution contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis for higher-level APMs and their integration in MPI implementations has been proposed as a replacement for traditional POSIX thread support to alleviate these challenges. In this paper, we first establish a taxonomy in an attempt to clearly distinguish different concepts in the parallel software stack. We argue that the proposed tight integration of fiber implementations with MPI is neither warranted nor beneficial and instead is detrimental to the goal of MPI being a portable communication abstraction. We propose MPI Continuations as an extension to the MPI standard to provide callback-based notifications on completed operations, leading to a clear separation of concerns by providing a loose coupling mechanism between MPI and APMs. We show that this interface is flexible and interacts well with different APMs, namely OpenMP detached tasks, OmpSs-2, and Argobots.Comment: 12 pages, 7 figures Published in proceedings of EuroMPI/USA '20, September 21-24, 2020, Austin, TX, US

    Single-Board-Computer Clusters for Cloudlet Computing in Internet of Things

    Get PDF
    The number of connected sensors and devices is expected to increase to billions in the near future. However, centralised cloud-computing data centres present various challenges to meet the requirements inherent to Internet of Things (IoT) workloads, such as low latency, high throughput and bandwidth constraints. Edge computing is becoming the standard computing paradigm for latency-sensitive real-time IoT workloads, since it addresses the aforementioned limitations related to centralised cloud-computing models. Such a paradigm relies on bringing computation close to the source of data, which presents serious operational challenges for large-scale cloud-computing providers. In this work, we present an architecture composed of low-cost Single-Board-Computer clusters near to data sources, and centralised cloud-computing data centres. The proposed cost-efficient model may be employed as an alternative to fog computing to meet real-time IoT workload requirements while keeping scalability. We include an extensive empirical analysis to assess the suitability of single-board-computer clusters as cost-effective edge-computing micro data centres. Additionally, we compare the proposed architecture with traditional cloudlet and cloud architectures, and evaluate them through extensive simulation. We finally show that acquisition costs can be drastically reduced while keeping performance levels in data-intensive IoT use cases.Ministerio de Economía y Competitividad TIN2017-82113-C2-1-RMinisterio de Economía y Competitividad RTI2018-098062-A-I00European Union’s Horizon 2020 No. 754489Science Foundation Ireland grant 13/RC/209

    Performance Evaluation of Data-Intensive Computing Applications on a Public IaaS Cloud

    Get PDF
    [Abstract] The advent of cloud computing technologies, which dynamically provide on-demand access to computational resources over the Internet, is offering new possibilities to many scientists and researchers. Nowadays, Infrastructure as a Service (IaaS) cloud providers can offset the increasing processing requirements of data-intensive computing applications, becoming an emerging alternative to traditional servers and clusters. In this paper, a comprehensive study of the leading public IaaS cloud platform, Amazon EC2, has been conducted in order to assess its suitability for data-intensive computing. One of the key contributions of this work is the analysis of the storage-optimized family of EC2 instances. Furthermore, this study presents a detailed analysis of both performance and cost metrics. More specifically, multiple experiments have been carried out to analyze the full I/O software stack, ranging from the low-level storage devices and cluster file systems up to real-world applications using representative data-intensive parallel codes and MapReduce-based workloads. The analysis of the experimental results has shown that data-intensive applications can benefit from tailored EC2-based virtual clusters, enabling users to obtain the highest performance and cost-effectiveness in the cloud.Ministerio de Economía y Competitividad; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/055Ministerio de Educación y Ciencia; AP2010-434

    A Global Optimisation Toolbox for Massively Parallel Engineering Optimisation

    Full text link
    A software platform for global optimisation, called PaGMO, has been developed within the Advanced Concepts Team (ACT) at the European Space Agency, and was recently released as an open-source project. PaGMO is built to tackle high-dimensional global optimisation problems, and it has been successfully used to find solutions to real-life engineering problems among which the preliminary design of interplanetary spacecraft trajectories - both chemical (including multiple flybys and deep-space maneuvers) and low-thrust (limited, at the moment, to single phase trajectories), the inverse design of nano-structured radiators and the design of non-reactive controllers for planetary rovers. Featuring an arsenal of global and local optimisation algorithms (including genetic algorithms, differential evolution, simulated annealing, particle swarm optimisation, compass search, improved harmony search, and various interfaces to libraries for local optimisation such as SNOPT, IPOPT, GSL and NLopt), PaGMO is at its core a C++ library which employs an object-oriented architecture providing a clean and easily-extensible optimisation framework. Adoption of multi-threaded programming ensures the efficient exploitation of modern multi-core architectures and allows for a straightforward implementation of the island model paradigm, in which multiple populations of candidate solutions asynchronously exchange information in order to speed-up and improve the optimisation process. In addition to the C++ interface, PaGMO's capabilities are exposed to the high-level language Python, so that it is possible to easily use PaGMO in an interactive session and take advantage of the numerous scientific Python libraries available.Comment: To be presented at 'ICATT 2010: International Conference on Astrodynamics Tools and Techniques

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing
    corecore