54 research outputs found

    Parallel Computational Steering and Analysis for HPC Applications using a ParaView Interface and the HDF5 DSM Virtual File Driver

    Get PDF
    Honourable Mention AwardInternational audienceWe present a framework for interfacing an arbitrary HPC simulation code with an interactive ParaView session using the HDF5 parallel IO library as the API. The implementation allows a flexible combination of parallel simulation, concurrent parallel analysis and GUI client, all of which may be on the same or separate machines. Data transfer between the simulation and the ParaView server takes place using a virtual file driver for HDF5 that bypasses the disk entirely and instead communicates directly between the coupled applications in parallel. The simulation and ParaView tasks run as separate MPI jobs and may therefore use different core counts and/or hardware configurations/platforms, making it possible to carefully tailor the amount of resources dedicated to each part of the workload. The coupled applications write and read datasets to the shared virtual HDF5 file layer, which allows the user to read data representing any aspect of the simulation and modify it using ParaView pipelines, then write it back, to be reread by the simulation (or vice versa). This allows not only simple parameter changes, but complete remeshing of grids, or operations involving regeneration of field values over the entire domain, to be carried out. To avoid the problem of manually customizing the GUI for each application that is to be steered, we make use of XML templates that describe outputs from the simulation, inputs back to it, and what user interactions are permitted on the controlled elements. This XML is used to generate GUI and 3D controls for manipulation of the simulation without requiring explicit knowledge of the underlying model

    Dataflow methods in HPC, visualisation and analysis

    Get PDF
    The processing power available to scientists and engineers using supercomputers over the last few decades has grown exponentially, permitting significantly more sophisticated simulations, and as a consequence, generating proportionally larger output datasets. This change has taken place in tandem with a gradual shift in the design and implementation of simulation and post-processing software, with a shift from simulation as a first step and visualisation/analysis as a second, towards in-situ on the fly methods that provide immediate visual feedback, place less strain on file-systems and reduce overall data-movement and copying. Concurrently, processor speed increases have dramatically slowed and multi and many-core architectures have instead become the norm for virtually all High Performance computing (HPC) machines. This in turn has led to a shift away from the traditional distributed one rank per node model, to one rank per process, using multiple processes per multicore node, and then back towards one rank per node again, using distributed and multi-threaded frameworks combined. This thesis consists of a series of publications that demonstrate how software design for analysis and visualisation has tracked these architectural changes and pushed the boundaries of HPC visualisation using dataflow techniques in distributed environments. The first publication shows how support for the time dimension in parallel pipelines can be implemented, demonstrating how information flow within an application can be leveraged to optimise performance and add features such as analysis of time-dependent flows and comparison of datasets at different timesteps. A method of integrating dataflow pipelines with in-situ visualisation is subsequently presented, using asynchronous coupling of user driven GUI controls and a live simulation running on a supercomputer. The loose coupling of analysis and simulation allows for reduced IO, immediate feedback and the ability to change simulation parameters on the fly. A significant drawback of parallel pipelines is the inefficiency caused by improper load-balancing, particularly during interactive analysis where the user may select between different features of interest, this problem is addressed in the fourth publication by integrating a high performance partitioning library into the visualization pipeline and extending the information flow up and down the pipeline to support it. This extension is demonstrated in the third publication (published earlier) on massive meshes with extremely high complexity and shows that general purpose visualization tools such as ParaView can be made to compete with bespoke software written for a dedicated task. The future of software running on many-core architectures will involve task-based runtimes, with dynamic load-balancing, asynchronous execution based on dataflow graphs, work stealing and concurrent data sharing between simulation and analysis. The final paper of this thesis presents an optimisation for one such runtime, in support of these future HPC applications

    Data Redistribution using One-sided Transfers to In-memory HDF5 Files

    Get PDF
    International audienceOutputs of simulation codes making use of the HDF5 file format are usually and mainly composed of several different attributes and datasets, storing either lightweight pieces of information or containing heavy parts of data. These objects, when written or read through the HDF5 layer, create metadata and data IO operations of different block sizes, which depend on the precision and dimension of the arrays that are being manipulated. By making use of simple block redistribution strategies, we present in this paper a case study showing HDF5 IO performance improvements for "in-memory" files stored in a distributed shared memory buffer using one-sided communications through the HDF5 API

    A Flexible Framework for Asynchronous In Situ and In Transit Analytics for Scientific Simulations

    Get PDF
    International audienceHigh performance computing systems are today composed of tens of thousands of processors and deep memory hierarchies. The next generation of machines will further increase the unbalance between I/O capabilities and processing power. To reduce the pressure on I/Os, the in situ analytics paradigm proposes to process the data as closely as possible to where and when the data are produced. Processing can be embedded in the simulation code, executed asynchronously on helper cores on the same nodes, or performed in transit on staging nodes dedicated to analytics. Today, software environ- nements as well as usage scenarios still need to be investigated before in situ analytics become a standard practice. In this paper we introduce a framework for designing, deploying and executing in situ scenarios. Based on a com- ponent model, the scientist designs analytics workflows by first developing processing components that are next assembled in a dataflow graph through a Python script. At runtime the graph is instantiated according to the execution context, the framework taking care of deploying the application on the target architecture and coordinating the analytics workflows with the simulation execution. Component coordination, zero- copy intra-node communications or inter-nodes data transfers rely on per-node distributed daemons. We evaluate various scenarios performing in situ and in transit analytics on large molecular dynamics systems sim- ulated with Gromacs using up to 1664 cores. We show in particular that analytics processing can be performed on the fraction of resources the simulation does not use well, resulting in a limited impact on the simulation performance (less than 6%). Our more advanced scenario combines in situ and in transit processing to compute a molecular surface based on the Quicksurf algorithm

    A framework for multidimensional indexes on distributed and highly-available data stores

    Get PDF
    Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs. In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed. On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together. On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it. According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions. This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability. Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones científicas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances científicos. En esta tesis, describimos 3 aplicaciones científicas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientífico y simulaciones físicas, donde hemos experimentado en primera mano las limitaciones de las tecnologías existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serían deseables y no existen, y nos hemos centrado en dos características que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos científicos. Por un lado, las simulaciones científicas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples características correlacionadas. Por ejemplo, una partícula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analíticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analíticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo así la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una característica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analítico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analítico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los científicos ofreciendo una solución escalable e integraPostprint (published version

    Coupling streaming AI and HPC ensembles to achieve 100-1000x faster biomolecular simulations

    Full text link
    Machine learning (ML)-based steering can improve the performance of ensemble-based simulations by allowing for online selection of more scientifically meaningful computations. We present DeepDriveMD, a framework for ML-driven steering of scientific simulations that we have used to achieve orders-of-magnitude improvements in molecular dynamics (MD) performance via effective coupling of ML and HPC on large parallel computers. We discuss the design of DeepDriveMD and characterize its performance. We demonstrate that DeepDriveMD can achieve between 100-1000x acceleration for protein folding simulations relative to other methods, as measured by the amount of simulated time performed, while covering the same conformational landscape as quantified by the states sampled during a simulation. Experiments are performed on leadership-class platforms on up to 1020 nodes. The results establish DeepDriveMD as a high-performance framework for ML-driven HPC simulation scenarios, that supports diverse MD simulation and ML back-ends, and which enables new scientific insights by improving the length and time scales accessible with current computing capacity

    Modeling High-throughput Applications for in situ Analytics

    Get PDF
    International audienceWith the goal of performing exascale computing, the importance of I/Omanagement becomes more and more critical to maintain system performance.While the computing capacities of machines are getting higher, the I/O capa-bilities of systems do not increase as fast. We are able to generate more databut unable to manage them eciently due to variability of I/O performance.Limiting the requests to the Parallel File System (PFS) becomes necessary. Toaddress this issue, new strategies are being developed such as online in situanalysis. The idea is to overcome the limitations of basic post-mortem dataanalysis where the data have to be stored on PFS rst and processed later.There are several software solutions that allow users to specically dedicatenodes for analysis of data and distribute the computation tasks over dier-ent sets of nodes. Thus far, they rely on a manual resource partitioning andallocation by the user of tasks (simulations, analysis).In this work, we propose a memory-constraint modelization for in situ anal-ysis. We use this model to provide dierent scheduling policies to determineboth the number of resources that should be dedicated to analysis functions,and that schedule eciently these functions. We evaluate them and show theimportance of considering memory constraints in the model. Finally, we discussthe dierent challenges that have to be addressed in order to build automatictools for in situ analytics

    Méthodes In-Situ et In-Transit : vers un continuum entre les applications interactives et offline à grande échelle.

    Get PDF
    Parallel simulation has become a very useful tool in various scientific areas. In order to perform such simulations, large parallel machines are required. The computational power of these machine continues to grow, allowing scientists to construct larger and larger models. However, the I/O systems, used to store the data produced by simulation, have not improved at the same pace. Currently, it is already difficult for scientist to store all the accumulated data and to have enough computational power later on to process them. Yet, these data are the key toward major scientific discoveries.In-situ treatments are a promising solution to this problem. The idea is to analyze the data while the simulation is still running and the data are still living in memory. This approach allows avoiding the I/O bottleneck as well as taking benefit of the computational power provided by a supercomputer to perform the analysis. In this thesis, we propose to use the data flow paradigm to construct complex asynchronous in-situ applications. We use the middleware FlowVR to couple heterogeneous parallel codes and to form a graph. Our approach provides enough flexibility to facilitate various placement strategies for the analytics in order to minimize their impact on the simulation. We applied our approach exemplarily to a well-known software from the field of molecular dynamics, Gromacs. With of the of biology experts, we designed several realistic scenarios in which we evaluated both the flexibility of our approach and the capability of our infrastructure to support each step of the biologists' analysis workflow.Les simulations parallèles sont devenues des outils indispensables dans de nombreux domaines scientifiques. La puissance de calcul de ces machines n'a cessé de monter permettant ainsi le traitement de simulations de plus en plus imposantes. En revanche, les systèmes d'I/O nécessaires à la sauvegarde des données produites par les simulations ont suivit une croissance beaucoup plus faible. Actuellement déjà, il est difficile pour les scientifiques de sauvegarder l'ensemble des données désirées et d'avoir suffisamment de puissance de calcul pour les analyser par la suite. Ces données sont pourtant une des clés vers des découvertes scientifiques majeures. Les traitements in-situ sont une solution prometteuse à ce problème. Le principe est d'effectuer des analyses alors que la simulation est en cours d'exécution et que les données sont encore en mémoire. Cette approche permet d'une part d'éviter le goulot d'étranglement au niveau des I/O mais aussi de profiter de la puissance de calcul offerte par les machines parallèles pour effectuer des traitements lourds. Dans cette thèse, nous proposons d'utiliser le paradigme du dataflow pour permettre la construction d'applications in-situ complexes asynchrones. Pour cela, nous utilisons l'intergiciel FlowVR permettant de coupler des codes parallèles hétérogènes en formant un graphe. Nous proposons une approche avec suffisamment de flexibilité pour permettre plusieurs stratégies de placement des processus d'analyses que cela soit sur les nœuds de la simulation, sur des cœurs dédiés ou des nœuds dédiés. De plus, les traitements in-situ peuvent être exécutés de manière asynchrone permettant ainsi un faible impact sur les performances de la simulation. Pour démontrer la flexibilité de notre approche, nous nous sommes intéressés au cas à la dynamique moléculaire et plus particulièrement Gromacs, un code de simulation de dynamique moléculaire couramment utilisé par les biologistes pouvant passer à l'échelle sur plusieurs milliers de coeurs. En étroite collaboration avec des experts du domaine biologique, nous avons construit plusieurs applications pour évaluer les performances et la flexibilité de notre approche
    • …
    corecore