419 research outputs found

    Exploring Scientific Application Performance Using Large Scale Object Storage

    Full text link
    One of the major performance and scalability bottlenecks in large scientific applications is parallel reading and writing to supercomputer I/O systems. The usage of parallel file systems and consistency requirements of POSIX, that all the traditional HPC parallel I/O interfaces adhere to, pose limitations to the scalability of scientific applications. Object storage is a widely used storage technology in cloud computing and is more frequently proposed for HPC workload to address and improve the current scalability and performance of I/O in scientific applications. While object storage is a promising technology, it is still unclear how scientific applications will use object storage and what the main performance benefits will be. This work addresses these questions, by emulating an object storage used by a traditional scientific application and evaluating potential performance benefits. We show that scientific applications can benefit from the usage of object storage on large scales.Comment: Preprint submitted to WOPSSS workshop at ISC 201

    Including the workload effect in the parallel program signature

    Get PDF
    Performance prediction and application behavior modeling have been the subject of exten- sive research that aim to estimate applications performance with an acceptable precision. A novel approach to predict the performance of parallel applications is based in the con- cept of Parallel Application Signatures that consists in extract an application most relevant parts (phases) and the number of times they repeat (weights). Executing these phases in a target machine and multiplying its exeuction time by its weight an estimation of the application total execution time can be made. One of the problems is that the performance of an application depends on the program workload. Every type of workload affects differently how an application performs in a given system and so affects the signature execution time. Since the workloads used in most scientific parallel applications have dimensions and data ranges well known and the behavior of these applications are mostly deterministic, a model of how the programs workload affect its performance can be obtained. We create a new methodology to model how a program's workload affect the parallel application signature. Using regression analysis we are able to generalize each phase time execution and weight function to predict an application performance in a target system for any type of workload within predefined range. We validate our methodology using a synthetic program, benchmarks applications and well known real scientific applications.La predicción del rendimiento y el modelado del comportamiento de las aplicaciones son tópicos ampliamente estudiados y se cuentan con numerosos trabajos de investigación que pretenden estimar el rendimiento de la aplicaciones con una precisión aceptable. Un nuevo enfoque para predecir el rendimiento de aplicaciones paralelas es el basado en el concepto de las firmas de aplicaciones paralelas que consiste en extraer las partes mas relevantes de una aplicación (fases) y el número de veces que se repiten (pesos). Ejecutando estas fases en una máquina destino y multiplicando su tiempo de ejecución por su peso, se puede obtener una estimación del tiempo total de ejecución de la aplicación. Uno de los problemas es que el rendimiento de una aplicación depende de la carga de trabajo de esta. Cada tipo de carga de trabajo afecta de manera distinta el rendimiento que tiene una aplicación en un sistema determinado y por lo tanto el tiempo de ejecución de la firma. Dado que las cargas de trabajo de la mayoría de las aplicaciones científicas paralelas, tienen dimensiones y rango de datos bien conocidos y que el comportamiento de estas aplicaciones es generalmente determinista, se puede obtener un modelo de cómo la carga de trabajo de un programa afecta su rendimiento. Hemos creado una nueva metodología para modelar cómo la carga de trabajo de un programa afecta a la firma de la aplicación paralela. Usando análisis de regresión, hemos podido generalizar las funciones de tiempo de ejecución y peso para cada fase para predecir el rendimiento de una aplicación en un sistema destino para cualquier tipo de carga de trabajo dentro de un rango predefinido. Hemos validado nuestra metodología utilizando un programa sintético, aplicaciones de benchmarks y aplicaciones reales científicas bien conocidas

    Enabling portable I/O analysis of commercially sensitive HPC applications through workload replication

    Get PDF
    Benchmarking and analyzing I/O performance across high performance computing (HPC) platforms is necessary to identify performance bottlenecks and guide effective use of new and existing storage systems. Doing this with large production applications, which can often be commercially sensitive and lack portability, is not a straightforward task and the availability of a representative proxy for I/O workloads can help to provide a solution. We use Darshan I/O characterization and the MACSio proxy application to replicate five production workloads, showing how these can be used effectively to investigate I/O performance when migrating between HPC systems ranging from small local clusters to leadership scale machines. Preliminary results indicate that it is possible to generate datasets that match the target application with a good degree of accuracy. This enables a predictive performance analysis study of a representative workload to be conducted on five different systems. The results of this analysis are used to identify how workloads exhibit different I/O footprints on a file system and what effect file system configuration can have on performance

    Replicating HPC I/O workloads with proxy applications

    Get PDF
    Large scale simulation performance is dependent on a number of components, however the task of investigation and optimization has long favored computational and communication elements above I/O. Manually extracting the pattern of I/O behavior from a parent application is a useful way of working to address performance issues on a per-application basis, but developing workflows with some degree of automation and flexibility provides a more powerful approach to tackling current and future I/O challenges. In this paper we describe a workload replication workflow that extracts the I/O pattern of an application and recreates its behavior with a flexible proxy application. We demonstrate how simple lightweight characterization can be translated to provide an effective representation of a physics application, and show how a proxy replication can be used as a tool for investigating I/O library paradigms

    Performance Evaluation of Data-Intensive Computing Applications on a Public IaaS Cloud

    Get PDF
    [Abstract] The advent of cloud computing technologies, which dynamically provide on-demand access to computational resources over the Internet, is offering new possibilities to many scientists and researchers. Nowadays, Infrastructure as a Service (IaaS) cloud providers can offset the increasing processing requirements of data-intensive computing applications, becoming an emerging alternative to traditional servers and clusters. In this paper, a comprehensive study of the leading public IaaS cloud platform, Amazon EC2, has been conducted in order to assess its suitability for data-intensive computing. One of the key contributions of this work is the analysis of the storage-optimized family of EC2 instances. Furthermore, this study presents a detailed analysis of both performance and cost metrics. More specifically, multiple experiments have been carried out to analyze the full I/O software stack, ranging from the low-level storage devices and cluster file systems up to real-world applications using representative data-intensive parallel codes and MapReduce-based workloads. The analysis of the experimental results has shown that data-intensive applications can benefit from tailored EC2-based virtual clusters, enabling users to obtain the highest performance and cost-effectiveness in the cloud.Ministerio de Economía y Competitividad; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/055Ministerio de Educación y Ciencia; AP2010-434

    Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

    Get PDF
    “This is a post-peer-review, pre-copyedit version of an article published in Journal of Grid Computing. The final authenticated version is available online at: https://doi.org/10.1007/s10723-013-9250-y[Abstract] Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsystem using the Amazon EC2 Cluster Compute platform and the recent High I/O instance type, to determine its suitability for I/O-intensive applications. The evaluation has been carried out at different layers using representative benchmarks in order to evaluate the low-level cloud storage devices available in Amazon EC2, ephemeral disks and Elastic Block Store (EBS) volumes, both on local and distributed file systems. In addition, several I/O interfaces (POSIX, MPI-IO and HDF5) commonly used by scientific workloads have also been assessed. Furthermore, the scalability of a representative parallel I/O code has also been analyzed at the application level, taking into account both performance and cost metrics. The analysis of the experimental results has shown that available cloud storage devices can have different performance characteristics and usage constraints. Our comprehensive evaluation can help scientists to increase significantly (up to several times) the performance of I/O-intensive applications in Amazon EC2 cloud. An example of optimal configuration that can maximize I/O performance in this cloud is the use of a RAID 0 of 2 ephemeral disks, TCP with 9,000 bytes MTU, NFS async and MPI-IO on the High I/O instance type, which provides ephemeral disks backed by Solid State Drive (SSD) technology.Ministerio de Ciencia e Innovación; TIN2010-16735Ministerio de Educación; AP2010-4348Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; ref. 2010/

    Characterization and modeling of PIDX parallel I/O for performance optimization

    Get PDF
    pre-printParallel I/O library performance can vary greatly in re- sponse to user-tunable parameter values such as aggregator count, file count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the underlying file system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction

    Energy aware approach for HPC systems

    Get PDF
    International audienceHigh‐performance computing (HPC) systems require energy during their full life cycle from design and production to transportation to usage and recycling/dismanteling. Because of increase of ecological and cost awareness, energy performance is now a primary focus. This chapter focuses on the usage aspect of HPC and how adapted and optimized software solutions could improve energy efficiency. It provides a detailed explanation of server power consumption, and discusses the application of HPC, phase detection, and phase identification. The chapter also suggests that having the load and memory access profiles is insufficient for an effective evaluation of the power consumed by an application. The available leverages in HPC systems are also shown in detail. The chapter proposes some solutions for modeling the power consumption of servers, which allows designing power prediction models for better decision making.These approaches allow the deployment and usage of a set of available green leverages, permitting energy reduction
    corecore