24 research outputs found

    Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems

    Get PDF
    Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models

    Optimizing MPI Collectives on Shared Memory Multi-cores

    Get PDF

    Computación eficiente del alineamiento de secuencias de ADN sobre cluster de multicores

    Get PDF
    Una de las áreas de mayor interés y crecimiento en los últimos años dentro del procesamiento paralelo es la del tratamiento de grandes volúmenes de datos, tales como las secuencias de ADN. El tipo de procesamiento extensivo de comparación para analizar patrones genéticos requiere un esfuerzo importante en el desarrollo de algoritmos paralelos eficientes. El alineamiento de secuencias de ADN representa una de las operaciones más importantes dentro de la bioinformática. En 1981, Smith y Waterman desarrollaron un método para el alineamiento local de secuencias. Sin embargo, en la práctica se emplean diversas heurísticas en su lugar, debido a los requerimientos de procesamiento y de memoria del algoritmo Smith-Waterman. Si bien son más rápidas, las heurísticas no garantizan que el alineamiento óptimo sea encontrado. Es por ello que resulta interesante estudiar cómo aplicar la potencia de cómputo de plataformas paralelas actuales de manera de acelerar el proceso de alinear secuencias sin perder precisión en los resultados. Los niveles insostenibles de generación de calor y consumo de energía que se presentan al escalar al máximo la velocidad de los procesadores mononúcleos motivaron el surgimiento de los procesadores de múltiples núcleos (multicore). Un procesador multicore integra dos o más núcleos computacionales dentro de un único chip y, si bien estos son más simples y menos veloces, al combinarlos permiten mejorar el rendimiento global del procesador y al mismo tiempo hacerlo más eficiente energéticamente. Al incorporar este tipo de procesadores a los clusters convencionales, se da origen a una arquitectura conocida como cluster de multicores, que combina memoria compartida y distribuida, y donde la comunicación entre las diferentes unidades de procesamiento resulta ser heterogénea. En este trabajo se presenta un algoritmo paralelo distribuido para el alineamiento de secuencias de ADN basado en el método Smith-Waterman para ser ejecutado sobre las arquitecturas de cluster actuales. Además, se realiza un análisis de rendimiento del mismo. Por último, se presentan las conclusiones y las posibles líneas de trabajo futuro.Facultad de Informátic

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    A multi-tier cached I/O architecture for massively parallel supercomputers

    Get PDF
    Recent advances in storage technologies and high performance interconnects have made possible in the last years to build, more and more potent storage systems that serve thousands of nodes. The majority of storage systems of clusters and supercomputers from Top 500 list are managed by one of three scalable parallel file systems: GPFS, PVFS, and Lustre. Most large-scale scientific parallel applications are written in Message Passing Interface (MPI), which has become the de-facto standard for scalable distributed memory machines. One part of the MPI standard is related to I/O and has among its main goals the portability and efficiency of file system accesses. All of the above mentioned parallel file systems may be accessed also through the MPI-IO interface. The I/O access patterns of scientific parallel applications often consist of accesses to a large number of small, non-contiguous pieces of data. For small file accesses the performance is dominated by the latency of network transfers and disks. Parallel scientific applications lead to interleaved file access patterns with high interprocess spatial locality at the I/O nodes. Additionally, scientific applications exhibit repetitive behaviour when a loop or a function with loops issues I/O requests. When I/O access patterns are repetitive, caching and prefetching can effectively mask their access latency. These characteristics of the access patterns motivated several researchers to propose parallel I/O optimizations both at library and file system levels. However, these optimizations are not always integrated across different layers in the systems. In this dissertation we propose a novel generic parallel I/O architecture for clusters and supercomputers. Our design is aimed at large-scale parallel architectures with thousands of compute nodes. Besides acting as middleware for existing parallel file systems, our architecture provides on-line virtualization of storage resources. Another objective of this thesis is to factor out the common parallel I/O functionality from clusters and supercomputers in generic modules in order to facilitate porting of scientific applications across these platforms. Our solution is based on a multi-tier cache architecture, collective I/O, and asynchronous data staging strategies hiding the latency of data transfer between cache tiers. The thesis targets to reduce the file access latency perceived by the data-intensive parallel scientific applications by multi-layer asynchronous data transfers. In order to accomplish this objective, our techniques leverage the multi-core architectures by overlapping computation with communication and I/O in parallel threads. Prototypes of our solutions have been deployed on both clusters and Blue Gene supercomputers. Performance evaluation shows that the combination of collective strategies with overlapping of computation, communication, and I/O may bring a substantial performance benefit for access patterns common for parallel scientific applications.-----------------------------------------------------------------------------------------------------------------------------En los últimos años se ha observado un incremento sustancial de la cantidad de datos producidos por las aplicaciones científicas paralelas y de la necesidad de almacenar estos datos de forma persistente. Los sistemas de ficheros paralelos como PVFS, Lustre y GPFS han ofrecido una solución escalable para esta demanda creciente de almacenamiento. La mayoría de las aplicaciones científicas son escritas haciendo uso de la interfaz de paso de mensajes (MPI), que se ha convertido en un estándar de-facto de programación para las arquitecturas de memoria distribuida. Las aplicaciones paralelas que usan MPI pueden acceder a los sistemas de ficheros paralelos a través de la interfaz ofrecida por MPI-IO. Los patrones de acceso de las aplicaciones científicas paralelas consisten en un gran número de accesos pequeños y no contiguos. Para tamaños de acceso pequeños, el rendimiento viene limitado por la latencia de las transferencias de red y disco. Además, las aplicaciones científicas llevan a cabo accesos con una alta localidad espacial entre los distintos procesos en los nodos de E/S. Adicionalmente, las aplicaciones científicas presentan típicamente un comportamiento repetitivo. Cuando los patrones de acceso de E/S son repetitivos, técnicas como escritura demorada y lectura adelantada pueden enmascarar de forma eficiente las latencias de los accesos de E/S. Estas características han motivado a muchos investigadores en proponer optimizaciones de E/S tanto a nivel de biblioteca como a nivel del sistema de ficheros. Sin embargo, actualmente estas optimizaciones no se integran siempre a través de las distintas capas del sistema. El objetivo principal de esta tesis es proponer una nueva arquitectura genérica de E/S paralela para clusters y supercomputadores. Nuestra solución está basada en una arquitectura de caches en varias capas, una técnica de E/S colectiva y estrategias de acceso asíncronas que ocultan la latencia de transferencia de datos entre las distintas capas de caches. Nuestro diseño está dirigido a arquitecturas paralelas escalables con miles de nodos de cómputo. Además de actuar como middleware para los sistemas de ficheros paralelos existentes, nuestra arquitectura debe proporcionar virtualización on-line de los recursos de almacenamiento. Otro de los objeticos marcados para esta tesis es la factorización de las funcionalidades comunes en clusters y supercomputadores, en módulos genéricos que faciliten el despliegue de las aplicaciones científicas a través de estas plataformas. Se han desplegado distintos prototipos de nuestras soluciones tanto en clusters como en supercomputadores. Las evaluaciones de rendimiento demuestran que gracias a la combicación de las estratégias colectivas de E/S y del solapamiento de computación, comunicación y E/S, se puede obtener una sustancial mejora del rendimiento en los patrones de acceso anteriormente descritos, muy comunes en las aplicaciones paralelas de caracter científico

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
    corecore