3 research outputs found

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    Applications Development for the Computational Grid

    Get PDF

    Architectural patterns for Parallel Programming: models for performance estimation.

    Get PDF
    Parallel Programming relies on the coordination of computing resources, so that they simultaneously work towards a common objective. Achieving this requires extra effort from the software designer, because of the increased complexity involved. Furthermore, as Parallel Programming is considered a means to improve performance, the software designer has to consider sophisticated and cost-effective practices and techniques for performance measurement and analysis. In particular, it is of great interest to obtain performance information during design stages and before implementation, since this enables the software developer to select the organisation of computations and communications between components. The Architectural Performance Modelling Method is presented as a criteria for selecting the organisation of a parallel program based on estimating its probable per formance. By considering a parallel program as an instance of a software architecture, it can be described in terms of interacting software components. Such components can be classified depending on their particular objective and their rate of change, for example, as components associated with the hardware and software environment (or Platform), components representing the fundamental structural organisation for execution and communication (or Coordination), and so on. The performance of a parallel program can be estimated as the result of the contribution of each one of those kinds of components. An Architectural Performance Model is based on selecting from the Architectural Patterns for Parallel Programming (descriptions of coodinations commonly used in Parallel Programming), a component simulator (representing a simulation of a processing component's behaviour), and a performance analysis of parallel applications (in which the information on system performance is examined). Parallel programs simulated using the Architectural Performance Modelling Method range from a complete parallel pro gram to a partially implemented program design. The simulation of parallel systems, using the information about the problem to be solved, the available resources, and architectural patterns describing overall coordinations of the parallel programs, makes it possible to identify the best performing architectural solution for the system being built
    corecore