6 research outputs found

    Clusterfile: a parallel file system for clusters

    Get PDF

    Work in progress about enhancing the programmability and energy efficiency of storage in HPC and cloud environments

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.We present the work in progress for the PhD thesis titled “Enhancing the programmability and energy efficiency of storage in HPC and cloud environments”. In this thesis, we focus on studying and optimizing data movement across different layers of the operating system’s I/O stack. We study the power consumption during I/O-intensive workloads using sophisticated software and hardware instrumentation, collecting time series data from internal ATX power lines that feed every system component, and several run-time operating system metrics. Data exploration and data analysis reveal for each I/O access pattern various power and performance regimes. These regimes show how power is used by the system as data moved through the I/O stack. We use this knowledge to build I/O power models that are able to predict power consumption for different I/O workloads, and optimize the CPU device driver that manage performance states to obtain great power savings (over 30%). Finally, we develop new mechanisms and abstractions that allow co-located virtual machines to share data with each other more efficiently. Our virtualized data sharing solution reduces data movement among virtual domains, leading to energy savings I/O performance improvements.European Cooperation in Science and Technology. COS

    Making the case for reforming the I/O software stack of extreme-scale systems

    Get PDF
    This work was supported in part by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231. This research has been partially funded by the Spanish Ministry of Science and Innovation under grant TIN2010-16497 “Input/Output techniques for distributed and high-performance computing environments”. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement number 328582

    Surfing the optimization space of a multiple-GPU parallel implementation of a X-ray tomography reconstruction algorithm

    Get PDF
    The increasing popularity of massively parallel architectures based on accelerators have opened up the possibility of significantly improving the performance of X-ray computed tomography (CT) applications towards achieving real-time imaging. However, achieving this goal is a challenging process, as most CT applications have not been designed for exploiting the amount of parallelism existing in these architectures. In this paper we present the massively parallel implementation and optimization of Mangoose(++), a CT application for reconstructing 3D volumes from 20 images collected by scanners based on cone-beam geometry. The main contribution of this paper are the following. First, we develop a modular application design that allows to exploit the functional parallelism inside the application and to facilitate the parallelization of individual application phases. Second, we identify a set of optimizations that can be applied individually and in combination for optimally deploying the application on a massively parallel multi-GPU system. Third, we present a study of surfing the optimization space of the modularized application and demonstrate that a significant benefit can be obtained from employing the adequate combination of application optimizations. (C) 2014 Elsevier Inc. All rights reserved.This work was partially funded by the Spanish Ministry of Science and Technology under the grant TIN2010-16497, the AMIT project (CEN-20101014) from the CDTI-CENIT program, RECAVA-RETIC Network (RD07/0014/2009), projects TEC2010-21619-C04-01, TEC2011-28972-C02-01, and PI11/00616 from the Spanish Ministerio de Ciencia e Innovacion, ARTEMIS program (S2009/DPI-1802), from the Comunidad de Madrid

    Analyzing Power Consumption of I/O Operations in HPC Applications

    Get PDF
    Data movement is becoming a key issue in terms of performance and energy consumption in high performance computing (HPC) systems, in general, and Exascale systems, in particular. A preliminary step to perform I/O optimization and face the Exascale challenges is to deepen our understanding of energy consumption across the I/O stacks. In this paper, we analyze the power draw of different I/O operations using a new fine-grained internal wattmeter while simultaneously collecting system metrics. Based on correlations between the recorded metrics and the instantaneous internal power consumption, our methodology identifies the significant metrics with respect to power consumption and decides which ones should contribute directly or in a derivative manner. This approach has the advantage of building I/O power models based on a previous set of identified utilization metrics. This technique will be validated using write operations on an Intel Xeon Nehalem server system, as writes exhibit interesting patterns and distinct power regimes.The work presented in this paper has been partially supported by the EU Project FP7 318793 “EXA2GREEN” and partially supported by the EU under the COST Programme Action IC1305, “Network for Sustainable Ultrascale Computing (NESUS)” and by the grant TIN2013-41350-P, Scalable Data Management Techniques for High-End Computing Systems from the Spanish Ministry of Economy and Competitiveness.European Community's Seventh Framework Progra

    Implementación modular en GPU de un algoritmo de reconstrucción basado en FDK para tomografía de rayos X

    No full text
    Actas de: XXIX Congreso Anual de la Sociedad Espñaola de Ingeniería Biomédica (CASEIB 2011). Cáceres, 16-18 Noviembre 2011.La mayoría de los tomógrafos para pequeño animal están basados en geometría cone-beam con un detector plano orbitando en trayectoria circular. La reconstrucción en estos sistemas se suele hacer con un método basado en el algoritmo propuesto por Feldkamp, Davis y Kress (FDK). El aumento de velocidad en la reconstrucción para tomografía rayos X (TAC) es un requisito fundamental para la extensión de su aplicación clínica. En este artículo se presenta una implementación eficiente de unn algoritmo de reconstrucción modular basado en FDK, que aprovecha las posibilidades de cómputo paralelo y la eficiente interpolación provista en CUDA al usar memoria de texturas que ofrecen las unidades de procesamiento gráfico (GPU). El algoritmo implementado, probado en un micro-TAC de alta resolución, presenta una mejora de velocidad de ejecución de la etapa de retroproyección de un factor 40x respecto a una implementación secuencial de referencia escrita e C, manteniéndose en todo mmento la calidad de la reconstrucción.Este trabajo ha sido financiado parcialmente por los proyectos AMIT Projects del programa CDTI CENIT, TEC2007-64731, TEC2008-06715-C02-01, RD07/0014/2009, TRA2009 0175, RECAVA-RETIC, RD09/0077/00087 (Ministerio de Ciencia e Innovación), ARTEMIS S2009/DPI-1802 (Comunidad de Madrid) y TIN2010-16497 (Ministerio de Ciencia e Innovación).Publicad
    corecore