781 research outputs found

    Dataflow Programming Paradigms for Computational Chemistry Methods

    Get PDF
    The transition to multicore and heterogeneous architectures has shaped the High Performance Computing (HPC) landscape over the past decades. With the increase in scale, complexity, and heterogeneity of modern HPC platforms, one of the grim challenges for traditional programming models is to sustain the expected performance at scale. By contrast, dataflow programming models have been growing in popularity as a means to deliver a good balance between performance and portability in the post-petascale era. This work introduces dataflow programming models for computational chemistry methods, and compares different dataflow executions in terms of programmability, resource utilization, and scalability. This effort is driven by computational chemistry applications, considering that they comprise one of the driving forces of HPC. In particular, many-body methods, such as Coupled Cluster methods (CC), which are the gold standard to compute energies in quantum chemistry, are of particular interest for the applied chemistry community. On that account, the latest development for CC methods is used as the primary vehicle for this research, but our effort is not limited to CC and can be applied across other application domains. Two programming paradigms for expressing CC methods into a dataflow form, in order to make them capable of utilizing task scheduling systems, are presented. Explicit dataflow, is the programming model where the dataflow is explicitly specified by the developer, is contrasted with implicit dataflow, where a task scheduling runtime derives the dataflow. An abstract model is derived to explore the limits of the different dataflow programming paradigms

    Animation of a process for identifying and merging raster polygon areas

    Get PDF

    Evaluating techniques for parallelization tuning in MPI, OmpSs and MPI/OmpSs

    Get PDF
    Parallel programming is used to partition a computational problem among multiple processing units and to define how they interact (communicate and synchronize) in order to guarantee the correct result. The performance that is achieved when executing the parallel program on a parallel architecture is usually far from the optimal: computation unbalance and excessive interaction among processing units often cause lost cycles, reducing the efficiency of parallel computation. In this thesis we propose techniques oriented to better exploit parallelism in parallel applications, with emphasis in techniques that increase asynchronism. Theoretically, this type of parallelization tuning promises multiple benefits. First, it should mitigate communication and synchronization delays, thus increasing the overall performance. Furthermore, parallelization tuning should expose additional parallelism and therefore increase the scalability of execution. Finally, increased asynchronism would provide higher tolerance to slower networks and external noise. In the first part of this thesis, we study the potential for tuning MPI parallelism. More specifically, we explore automatic techniques to overlap communication and computation. We propose a speculative messaging technique that increases the overlap and requires no changes of the original MPI application. Our technique automatically identifies the application’s MPI activity and reinterprets that activity using optimally placed non-blocking MPI requests. We demonstrate that this overlapping technique increases the asynchronism of MPI messages, maximizing the overlap, and consequently leading to execution speedup and higher tolerance to bandwidth reduction. However, in the case of realistic scientific workloads, we show that the overlapping potential is significantly limited by the pattern by which each MPI process locally operates on MPI messages. In the second part of this thesis, we study the potential for tuning hybrid MPI/OmpSs parallelism. We try to gain a better understanding of the parallelism of hybrid MPI/OmpSs applications in order to evaluate how these applications would execute on future machines and to predict the execution bottlenecks that are likely to emerge. We explore how MPI/OmpSs applications could scale on the parallel machine with hundreds of cores per node. Furthermore, we investigate how this high parallelism within each node would reflect on the network constraints. We especially focus on identifying critical code sections in MPI/OmpSs. We devised a technique that quickly evaluates, for a given MPI/OmpSs application and the selected target machine, which code section should be optimized in order to gain the highest performance benefits. Also, this thesis studies techniques to quickly explore the potential OmpSs parallelism inherent in applications. We provide mechanisms to easily evaluate potential parallelism of any task decomposition. Furthermore, we describe an iterative trialand-error approach to search for a task decomposition that will expose sufficient parallelism for a given target machine. Finally, we explore potential of automating the iterative approach by capturing the programmers’ experience into an expert system that can autonomously lead the search process. Also, throughout the work on this thesis, we designed development tools that can be useful to other researchers in the field. The most advanced of these tools is Tareador – a tool to help porting MPI applications to MPI/OmpSs programming model. Tareador provides a simple interface to propose some decomposition of a code into OmpSs tasks. Tareador dynamically calculates data dependencies among the annotated tasks, and automatically estimates the potential OmpSs parallelization. Furthermore, Tareador gives additional hints on how to complete the process of porting the application to OmpSs. Tareador already proved itself useful, by being included in the academic classes on parallel programming at UPC.La programación paralela consiste en dividir un problema de computación entre múltiples unidades de procesamiento y definir como interactúan (comunicación y sincronización) para garantizar un resultado correcto. El rendimiento de un programa paralelo normalmente está muy lejos de ser óptimo: el desequilibrio de la carga computacional y la excesiva interacción entre las unidades de procesamiento a menudo causa ciclos perdidos, reduciendo la eficiencia de la computación paralela. En esta tesis proponemos técnicas orientadas a explotar mejor el paralelismo en aplicaciones paralelas, poniendo énfasis en técnicas que incrementan el asincronismo. En teoría, estas técnicas prometen múltiples beneficios. Primero, tendrían que mitigar el retraso de la comunicación y la sincronización, y por lo tanto incrementar el rendimiento global. Además, la calibración de la paralelización tendría que exponer un paralelismo adicional, incrementando la escalabilidad de la ejecución. Finalmente, un incremente en el asincronismo proveería una tolerancia mayor a redes de comunicación lentas y ruido externo. En la primera parte de la tesis, estudiamos el potencial para la calibración del paralelismo a través de MPI. En concreto, exploramos técnicas automáticas para solapar la comunicación con la computación. Proponemos una técnica de mensajería especulativa que incrementa el solapamiento y no requiere cambios en la aplicación MPI original. Nuestra técnica identifica automáticamente la actividad MPI de la aplicación y la reinterpreta usando solicitudes MPI no bloqueantes situadas óptimamente. Demostramos que esta técnica maximiza el solapamiento y, en consecuencia, acelera la ejecución y permite una mayor tolerancia a las reducciones de ancho de banda. Aún así, en el caso de cargas de trabajo científico realistas, mostramos que el potencial de solapamiento está significativamente limitado por el patrón según el cual cada proceso MPI opera localmente en el paso de mensajes. En la segunda parte de esta tesis, exploramos el potencial para calibrar el paralelismo híbrido MPI/OmpSs. Intentamos obtener una comprensión mejor del paralelismo de aplicaciones híbridas MPI/OmpSs para evaluar de qué manera se ejecutarían en futuras máquinas. Exploramos como las aplicaciones MPI/OmpSs pueden escalar en una máquina paralela con centenares de núcleos por nodo. Además, investigamos cómo este paralelismo de cada nodo se reflejaría en las restricciones de la red de comunicación. En especia, nos concentramos en identificar secciones críticas de código en MPI/OmpSs. Hemos concebido una técnica que rápidamente evalúa, para una aplicación MPI/OmpSs dada y la máquina objetivo seleccionada, qué sección de código tendría que ser optimizada para obtener la mayor ganancia de rendimiento. También estudiamos técnicas para explorar rápidamente el paralelismo potencial de OmpSs inherente en las aplicaciones. Proporcionamos mecanismos para evaluar fácilmente el paralelismo potencial de cualquier descomposición en tareas. Además, describimos una aproximación iterativa para buscar una descomposición en tareas que mostrará el suficiente paralelismo en la máquina objetivo dada. Para finalizar, exploramos el potencial para automatizar la aproximación iterativa. En el trabajo expuesto en esta tesis hemos diseñado herramientas que pueden ser útiles para otros investigadores de este campo. La más avanzada es Tareador, una herramienta para ayudar a migrar aplicaciones al modelo de programación MPI/OmpSs. Tareador proporciona una interfaz simple para proponer una descomposición del código en tareas OmpSs. Tareador también calcula dinámicamente las dependencias de datos entre las tareas anotadas, y automáticamente estima el potencial de paralelización OmpSs. Por último, Tareador da indicaciones adicionales sobre como completar el proceso de migración a OmpSs. Tareador ya se ha mostrado útil al ser incluido en las clases de programación de la UPC

    Generalized techniques for using system execution traces to support software performance analysis

    Get PDF
    This dissertation proposes generalized techniques to support software performance analysis using system execution traces in the absence of software development artifacts such as source code. The proposed techniques do not require modifications to the source code, or to the software binaries, for the purpose of software analysis (non-intrusive). The proposed techniques are also not tightly coupled to the architecture specific details of the system being analyzed. This dissertation extends the current techniques of using system execution traces to evaluate software performance properties, such as response times, service times. The dissertation also proposes a novel technique to auto-construct a dataflow model from the system execution trace, which will be useful in evaluating software performance properties. Finally, it showcases how we can use execution traces in a novel technique to detect Excessive Dynamic Memory Allocations software performance anti-pattern. This is the first attempt, according to the author\u27s best knowledge, of a technique to detect automatically the excessive dynamic memory allocations anti-pattern. The contributions from this dissertation will ease the laborious process of software performance analysis and provide a foundation for helping software developers quickly locate the causes for negative performance results via execution traces

    Improving Model-Based Software Synthesis: A Focus on Mathematical Structures

    Get PDF
    Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis. We focus on Kahn Process Networks and dataflow applications as abstractions, both for programming and for deriving an efficient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general. This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataflow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking. We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a first-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataflow programming, we also focus on programmability. The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures

    Cognitive dimensions usability assessment of textual and visual VHDL environments

    Get PDF
    Visual programming languages promise to make programming easier with simpler graphical methods, broadening access to computing by lessening the need for would-be users to become proficient with textual programming languages, with their somewhat arcane grammars and methods removed from the problem space of the user. However, after more than forty years of research in the field, visual methods remain in the margins of use and programming remains the bailiwick of people devoted to the endeavor. VPL designers need to understand the mechanisms of usability that pertain to complex systems like programming language environments. Effective research tools for studying usability, and sufficiently constrained, mature subjects for investigation are scarce. This study applies a usability research tool, with its origins in applied psychology, to a programming language surrogate from the hardware description language class of notations. The substitution is reasonable because of the great similarity between hardware description languages and programming languages. Considering VHDL (the VHSIC Hardware Description Language) is especially worthwhile for several reasons, but primarily because significant numbers of digital designers regularly employ both textual and visual VHDL environments to meet the same real-world design challenges. A comparative analysis of Cognitive Dimensions assessments of textual and visual VHDL environments should further understanding of the usability issues specifically related to visual methods – in many cases, the same visual methods used in visual programming languages. Furthermore, with this real-world ‘field lab’ better understood, it should be possible to design experiments to pursue the formalization of the CDs framework as a theory

    Compilation of Heterogeneous Models: Motivations and Challenges

    Get PDF
    International audienceThe widespread use of model driven engineering in the development of software-intensive systems, including high-integrity embedded systems, gave rise to a "Tower of Babel" of modeling languages. System architects may use languages such as OMG SysML and MARTE, SAE AADL or EAST-ADL; control and command engineers tend to use graphical tools such as MathWorks Simulink/Stateflow or Esterel Technologies SCADE, or textual languages such as MathWorks Embedded Matlab; software engineers usually rely on OMG UML; and, of course, many in-house domain specific languages are equally used at any step of the development process. This heterogeneity of modeling formalisms raises several questions on the verification and code generation for systems described using heterogeneous models: How can we ensure consistency across multiple modeling views? How can we generate code, which is optimized with respect to multiple modeling views? How can we ensure model-level verification is consistent with the run-time behavior of the generated executable application?In this position paper we describe the motivations and challenges of analysis and code generation from heterogeneous models when intra-view consistency, optimization and safety are major concerns. We will then introduce Project P 2 and Hi-MoCo 3-respectively FUI and Eurostars-funded collaborative projects tackling the challenges above. This work continues and extends, in a wider context, the work carried out by the Gene-Auto 4 project [1], [2]. Hereby we will present the key elements of Project P and Hi-MoCo, in particular: (i) the philosophy for the identification of safe and minimal practical subsets of input modeling languages; (ii) the overall architecture of the toolsets, the supported analysis techniques and the target languages for code generation; and finally, (iii) the approach to cross-domain qualification for an open-source, community-driven toolset

    A Fortran Kernel Generation Framework for Scientific Legacy Code

    Get PDF
    Quality assurance procedure is very important for software development. The complexity of modules and structure in software impedes the testing procedure and further development. For complex and poorly designed scientific software, module developers and software testers need to put a lot of extra efforts to monitor not related modules\u27 impacts and to test the whole system\u27s constraints. In addition, widely used benchmarks cannot help programmers with accurate and program specific system performance evaluation. In this situation, the generated kernels could provide considerable insight into better performance tuning. Therefore, in order to greatly improve the productivity of various scientific software engineering tasks such as performance tuning, debugging, and verification of simulation results, we developed an automatic compute kernel extraction prototype platform for complex legacy scientific code. In addition, considering that scientific research and experiment require long-term simulation procedure and the huge size of data transfer, we apply message passing based parallelization and I/O behavior optimization to highly improve the performance of the kernel extractor framework and then use profiling tools to give guidance for parallel distribution. Abnormal event detection is another important aspect for scientific research; dealing with huge observational datasets combined with simulation results it becomes not only essential but also extremely difficult. In this dissertation, for the sake of detecting high frequency event and low frequency events, we reconfigured this framework equipped with in-situ data transfer infrastructure. Through the method of combining signal processing data preprocess(decimation) with machine learning detection model to train the stream data, our framework can significantly decrease the amount of transferred data demand for concurrent data analysis (between distributed computing CPU/GPU nodes). Finally, the dissertation presents the implementation of the framework and a case study of the ACME Land Model (ALM) for demonstration. It turns out that the generated compute kernel with lower cost can be used in performance tuning experiments and quality assurance, which include debugging legacy code, verification of simulation results through single point and multiple points of variables tracking, collaborating with compiler vendors, and generating custom benchmark tests
    corecore