6 research outputs found

    Impacto de la entrada/salida en los computadores paralelos

    Get PDF
    El aumento de las unidades de procesamiento en los cl usters, los avances en velocidad y potencia de las unidades de procesamiento y la creciente complejidad de las aplicaciones cient cas demandan mayores exigencias a los sistemas de Entrada/Salida de los computadores paralelos. En este trabajo se propone una metodolog a para el an alisis de E/S en los cl usters de computadores, que permita analizar c omo afectan las diferentes con guraciones a la aplicaci on y usarla para seleccionar la mejor con guraci on del sistema de E/S. La metodolog a contempla la caracterizaci on del sistema de E/S a distintos niveles: dispositivo, sistema y aplicaci on; con guraci on de diferentes elementos que tienen impacto en las prestaciones y evaluaci on teniendo en cuenta tanto la aplicaci on como la arquitectura de E/S.Presentado en el X Workshop Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI

    Characterization and modeling of PIDX parallel I/O for performance optimization

    Get PDF
    pre-printParallel I/O library performance can vary greatly in re- sponse to user-tunable parameter values such as aggregator count, file count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the underlying file system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction

    Configuración de la entrada/salida paralela para el cómputo de altas prestaciones

    Get PDF
    La E/S Paralela es un área de investigación que tiene una creciente importancia en el cómputo de Altas Prestaciones. Si bien durante años ha sido el cuello de botella de los computadores paralelos en la actualidad, debido al gran aumento del poder de cómputo, el problema de la E/S se ha incrementado y la comunidad del Cómputo de Altas Prestaciones considera que se debe trabajar en mejorar el sistema de E/S de los computadores paralelos, para lograr cubrir las exigencias de las aplicaciones científicas que usan HPC. La Configuración de la Entrada/Salida (E/S) Paralela tiene una gran influencia en las prestaciones y disponibilidad, por ello es importante "Analizar configuraciones de E/S paralela para identificar los factores claves que influyen en las prestaciones y disponibilidad de la E/S de Aplicaciones Científicas que se ejecutan en un clúster". Para realizar el análisis de las configuraciones de E/S se propone una metodología que permite identificar los factores de E/S y evaluar su influencia para diferentes configuraciones de E/S formada por tres fases: Caracterización, Configuración y Evaluación. La metodología permite analizar el computador paralelo a nivel de Aplicación Científica, librerías de E/S y de arquitectura de E/S, pero desde el punto de vista de la E/S. Los experimentos realizados para diferentes configuraciones de E/S y los resultados obtenidos indican la complejidad del análisis de los factores de E/S y los diferentes grados de influencia en las prestaciones del sistema de E/S. Finalmente se explican los trabajos futuros, el diseño de un modelo que de soporte al proceso de Configuración del sistema de E/S paralela para aplicaciones científicas. Por otro lado, para identificar y evaluar los factores de E/S asociados con la disponibilidad a nivel de datos, se pretende utilizar la Arquitectura Tolerante a Fallos RADIC.Parallel I/O is a research area with growing importance in the world of High Performance Computing (HPC). For many years parallel I/O has been considered super computers "bottleneck" and this problem has increased, with the advent of major augments in computing power in modern computers, until it became an important subject to be considered at the parallel I/O research community, in order to meet the requirements of scientific applications using HPC. Considering Parallel Input/Output configurations have a great influence on the performance and availability of an I/O System, as main goal was proposed "to Analyze parallel I/O settings to identify key factors influencing the performance and I/O availability for Scientific Applications running on a HPC cluster." In order to perform I/O configuration analysis a methodology is proposed, to identify I/O factors and assess their influence to different I/O configurations, consisting of three phases: Characterization, Setting and Evaluation. The methodology allows super computer analysis at many levels, as the scientific application, I/O libraries and architecture, from the I/O point of view. Experiments performed for different I/O configurations and results obtained indicate the complexity of I/O factors analysis and different influence degrees on I/O system performance. Finally, discussion about future work is given, as well as the design of a model that supports I/O system configuration process for parallel scientific applications. On the other hand, to identify and evaluate I/O factors associated with level of data availability, RADIC Fault Tolerant Architecture is proposed as an alternative to use

    Uma arquitetura paralela para o armazenamento de imagens médicas em sistemas de arquivos distribuídos

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação.Com a implantação da Rede Catarinense de Telemedicina tem-se verificado um aumento significativo no volume de imagens médicas, do padrão DICOM, geradas pelos dispositivos médicos interconectados nesta rede. Visando a manipulação dessas imagens médicas, foi desenvolvido em um projeto prévio, um servidor conhecido como CyclopsDCMServer, para a manipulação das imagens DICOM considerando a abordagem usando o Hierarchical Data Format (HDF5). Todavia, é esperado que a abordagem venha a encontrar gargalos devido ao crescimento no volume de dados e operações simultâneas que são submetidas ao servidor. Com o objetivo de dar continuidade ao esforço para prover uma melhor escalabilidade ao servidor CyclopsDCMServer, nesta dissertação apresenta-se uma pesquisa no sentido de potencializar a implementação de um paradigma paralelo no servidor para o armazenamento e recuperação das imagens DICOM. Desta forma, desenvolveu-se um módulo considerando bibliotecas E/S paralelas de alto desempenho. Este módulo efetua uma comunicação com o servidor que é responsável pela realização do acesso paralelo no formato de dados hierárquico. Visando a avaliação de desempenho da abordagem paralela, foram executados experimentos em diferentes sistemas de arquivos distribuídos. Os experimentos foram focados principalmente nas operações de armazenamento e recuperação das imagens médicas. Comparou-se o tempo médio de execução de cada operação em serial e paralelo. Foi coletado também o tempo de E/S em cada operação, para averiguar somente o desempenho do processo de escrita e leitura dos dados, descartando qualquer atraso que pudesse interferir nos resultados. Os resultados empíricos demonstraram que, independente do sistema de arquivos, a abordagem paralela ainda não apresenta uma eficiência considerável, quando comparada com a arquitetura serial. A média do declínio de desempenho pode ser considerada em torno de 45% na operação de recuperação e 71% na operação de armazenamento. Verificou-se também que o aumento do número de processos paralelos pode causar uma perda maior de desempenho nesta abordagem.With the deployment of Catarinense Network of Telemedicine has verified a meaningful increase in volume of medical images, DICOM standard, generated by medical devices interconnected on this network. In order to manipulate this medical images was develop in one previous project, a server known as CyclopsDCMServer, to manipulate DICOM images considering the approach Hierarchical Data Format (HDF5). However, it is expected that this approach will find bottlenecks due the spread of data size and simultaneously operations submitted to the server. With focus to continue the effort to supply better scalability to the server CyclopsDCMServer, this dissertation presents a research in the sense to empowerment the implementation of a parallel paradigm in the server to storage and retrieve DICOM images. Thus, it was developed a module considering high performance parallel I/O libraries. This module performs a communication with the server that is responsible for the creation of parallel access in hierarchical data format Aiming at the performance evaluation of the parallel approach, experiments were performed in different distributed file systems. The experiments were mainly focused on the operations of storage and retrieval of medical images. It was compared the average execution time of each operation in serial and parallel. It was also collected the I/O time in each operation, only to ascertain the performance of the process of writing and reading data, discarding any delay that could meddle the results. The empirical results show that, regardless of file system, the parallel approach does not present a considerable eficiency when compared to the serial architecture. The average decline in performance can be seen at around 45 % in the recovery operation and 71 % in the storage operation. It was also observed that increasing the number of parallel processes can cause a larger loss of performance in this approach

    Doctor of Philosophy

    Get PDF
    dissertationThe increase in computational power of supercomputers is enabling complex scientific phenomena to be simulated at ever-increasing resolution and fidelity. With these simulations routinely producing large volumes of data, performing efficient I/O at this scale has become a very difficult task. Large-scale parallel writes are challenging due to the complex interdependencies between I/O middleware and hardware. Analytic-appropriate reads are traditionally hindered by bottlenecks in I/O access. Moreover, the two components of I/O, data generation from simulations (writes) and data exploration for analysis and visualization (reads), have substantially different data access requirements. Parallel writes, performed on supercomputers, often deploy aggregation strategies to permit large-sized contiguous access. Analysis and visualization tasks, usually performed on computationally modest resources, require fast access to localized subsets or multiresolution representations of the data. This dissertation tackles the problem of parallel I/O while bridging the gap between large-scale writes and analytics-appropriate reads. The focus of this work is to develop an end-to-end adaptive-resolution data movement framework that provides efficient I/O, while supporting the full spectrum of modern HPC hardware. This is achieved by developing technology for highly scalable and tunable parallel I/O, applicable to both traditional parallel data formats and multiresolution data formats, which are directly appropriate for analysis and visualization. To demonstrate the efficacy of the approach, a novel library (PIDX) is developed that is highly tunable and capable of adaptive-resolution parallel I/O to a multiresolution data format. Adaptive resolution storage and I/O, which allows subsets of a simulation to be accessed at varying spatial resolutions, can yield significant improvements to both the storage performance and I/O time. The library provides a set of parameters that controls the storage format and the nature of data aggregation across he network; further, a machine learning-based model is constructed that tunes these parameters for the maximum throughput. This work is empirically demonstrated by showing parallel I/O scaling up to 768K cores within a framework flexible enough to handle adaptive resolution I/O
    corecore