68 research outputs found

    Contributions to Desktop Grid Computing : From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures

    Get PDF
    Since the mid 90’s, Desktop Grid Computing - i.e the idea of using a large number of remote PCs distributed on the Internet to execute large parallel applications - has proved to be an efficient paradigm to provide a large computational power at the fraction of the cost of a dedicated computing infrastructure.This document presents my contributions over the last decade to broaden the scope of Desktop Grid Computing. My research has followed three different directions. The first direction has established new methods to observe and characterize Desktop Grid resources and developed experimental platforms to test and validate our approach in conditions close to reality. The second line of research has focused on integrating Desk- top Grids in e-science Grid infrastructure (e.g. EGI), which requires to address many challenges such as security, scheduling, quality of service, and more. The third direction has investigated how to support large-scale data management and data intensive applica- tions on such infrastructures, including support for the new and emerging data-oriented programming models.This manuscript not only reports on the scientific achievements and the technologies developed to support our objectives, but also on the international collaborations and projects I have been involved in, as well as the scientific mentoring which motivates my candidature for the Habilitation `a Diriger les Recherches

    Programming models for mobile environments

    Get PDF
    Premi extraordinari doctorat UPC curs 2017-2018. Àmbit d’Enginyeria de les TICFor the last decade, mobile devices have grown in popularity and became the best-selling computing devices. Despite their high capabilities for user interactions and network connectivity, the computing power of mobile devices is low and the lifetime of the application running on them limited by the battery. Mobile Cloud Computing (MCC) is a technology that tackles the limitations of mobile devices by bringing together their mobility with the vast computing power of the Cloud. Programming applications for Mobile Cloud Computing (MCC) environments is not as straightforward as coding monolithic applications. Developers have to deal with the issues related to parallel programming for distributed infrastructures while considering the battery lifetime and the variability of the network produced by the high mobility of this kind of devices. As with any other distributed environment, developers turn to programming models to improve their productivity by avoiding the complexity of manually dealing with these issues and delegate on the corresponding model all the management of these concerns. This thesis contributes to the current state of the art with an adaptation of the COMPSs programming model for MCC environments. COMPSs allows application programmers to code their applications in a sequential, infrastructure-agnostic fashion without calls to any COMPSs-specific API using the native language for the target platform as if they were to run on the mobile device. At execution time, a runtime system automatically partitions the application into tasks and orchestrates their execution on top of the available resources. This thesis contributes with an extension to the programming model to allow task polymorphism and let the runtime exploit computational resources other than the CPU of the resources. Besides, the runtime architecture has been redesigned with the characteristics of MCC in mind, and it runs as a common service which all the applications running simultaneously on the mobile device contact for submitting the execution of their tasks. For collaboratively exploiting both, local and remote resources, the runtime clusters the computational devices into Computing Platforms according to the mechanisms required to provide the processing elements with the necessary input values, launch the task execution avoiding resource oversubscription and fetching the results back from them. The CPU Platform run tasks on the cores of the CPU. The GPU Platform leverages on OpenCL to run tasks as kernels on GPUs or other accelerators embedded in the mobile device. Finally, the Cloud Platform offloads the execution of tasks onto remote resources. To holistically decide whether is worth running a task on embedded or on remote resources, the runtime considers the the costs -- time, energy and money -- of running the computation on each of the platforms and picks the best. Each platform manages internally its resources and orchestrates the execution of tasks on them using different scheduling policies. Using local and remote computing devices forces the runtime to share data values among the nodes of the infrastructure. This data is potentially privacy-sensitive, and the runtime exposes it to possible attackers when transferring it through the network. To protect the application user from data leaks, the runtime has to provide communications with secrecy, integrity and authenticity. In the extreme case of a network breakdown that isolates the mobile device from the remote nodes, the runtime has to ensure that the execution continues to provide the application user with the expected result even if the connection never re-establishes. The mobile device has to respond using only the resources embedded in it, what could incur in the re-execution of computations already ran on the remote resources. Remote workers have to continue with the execution so that, in case of reconnection, both parts synchronize its progress to reduce the impact of the disruption.Els últims anys, els dispositius mòbils han guanyat en popularitat i s'han convertit en els dispositius més venuts. Tot i la connectivitat i la bona interacció amb l'usuari que ofereixen, la seva capacitat de càlcul is baixa i limitada per la vida de la bateria. El Mobile Cloud Computing (MCC) és una tecnologia que soluciona les limitacions d'aquests dispositius ajuntant la seva mobilitat amb la gran capacitat de còmput del Cloud. Programar aplicacions per entorns MCC no és tan directe com fer aplicacions monolítiques. Els desenvolupadors han de tractar amb els problemes relacionats amb la programació paral·lela mentre tenen en compte la duració de la bateria i la variabilitat de la xarxa degut a la mobilitat inherent a aquest tipus de dispositius. Com per qualsevol altre entorn distribuït, els desenvolupadors recorren a models de programació que millorin la seva productivitat i els evitin tractar manualment amb aquests problemes delegant la seva gestió en el model. Aquesta tesis contribueix a l'estat de l'art actual amb una adaptació del model de programació COMPSs als entorns MCC. COMPSs permet als desenvolupadors programar les aplicacions de forma agnòstica a la infraestructura i seqüencial sense necessitat d'invocar cap API específica utilitzant el llenguatge natiu de la platforma com si l'aplicació s'executés directament en el mòbil. En temps d'execució, una eina (runtime) automàticament divideix l'aplicació en tasques i n'orquestra la seva execució sobre els recursos disponibles. Aquesta tesis estèn el model de programació per tal de permetre polimorfisme a nivell de tasca i deixar al runtime explotar els recursos computacionals dels que disposa el mòbil a part de la CPU. A més a més, l'arquitectura del runtime s'ha redissenyat tenint en compte les característiques pròpies del MCC, i aquest s'executa com un servei comú al que totes les aplicacions del mòbil contacten per tal d'executar les seves tasques. Per explotar col·laborativament tots els recursos, locals i remots, el runtime agrupa els recursos en Computing Platforms en funció dels mecanismes necessaris per proveir el recurs amb les dades d'entrada necessàries, llançar l'execució i recuperar-ne els resultats. La CPU Platform executa tasques en els nuclis de la CPU. La GPU Platform utilitza OpenCL per executar tasques en forma de kernels a la GPU o altres acceleradors integrats en el mòbil. Finalment, la Cloud Platform descàrrega l'execució de tasques en recursos remots. Per decidir holisticament si és millor executar una tasca en un recurs local o en un remot, el runtime considera els costs (temporal, energètic econòmic) d'executar la tasca en cada una de les plataformes i n'escull la millor. Cada plataforma gestiona internament els seus recursos i orquestra l'execució de les tasques en ells seguint diferents polítiques de planificació. L'ús de recursos locals i remots força la compartició de dades entre els nodes de la infraestructura. Aquestes dades són potencialment sensibles i de caràcter privat i el runtime les exposa a possibles atacs que les transfereix per la xarxa. Per tal de protegir l'usuari de possibles fuites de dades, el runtime ha de dotar les comunicacions amb confidencialitat, integritat i autenticitat. En el cas extrem en que un error de xarxa aïlli el dispositiu mòbil dels nodes remots, el runtime ha d'assegurar que l'execució continua i que eventualment l'usuari rebrà el resultat esperat fins i tot en cas de que la connexió no és restableixi mai. El mòbil ha de ser capaç d'executar l'aplicació utilitzant únicament les dades i recursos disponibles en aquell moment, la qual cosa pot forçar la re-execució d'algunes tasques ja calculades en els recursos remots. Els recursos remots han de continuar l'execució per tal que en cas de reconnexió, ambdues parts sincronitzin el seu progrés i es minimitzi l'impacte de la desconnexió.Award-winningPostprint (published version

    Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms

    Get PDF
    This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ‘repeated computation with a possibility of premature termination’. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach. The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languages—one for the implementation and one for the interface—for our implementation of computer algebra algorithms. Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the ‘parallel penalty’. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods. We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassen’s matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fraction—an approach, known from literature. We also performed execution time estimations of our divide and conquer programs. This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called ‘parallel repeated computation’, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the Rabin–Miller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution. The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the Gauß elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms. Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms

    Optimización del rendimiento y la eficiencia energética en sistemas masivamente paralelos

    Get PDF
    RESUMEN Los sistemas heterogéneos son cada vez más relevantes, debido a sus capacidades de rendimiento y eficiencia energética, estando presentes en todo tipo de plataformas de cómputo, desde dispositivos embebidos y servidores, hasta nodos HPC de grandes centros de datos. Su complejidad hace que sean habitualmente usados bajo el paradigma de tareas y el modelo de programación host-device. Esto penaliza fuertemente el aprovechamiento de los aceleradores y el consumo energético del sistema, además de dificultar la adaptación de las aplicaciones. La co-ejecución permite que todos los dispositivos cooperen para computar el mismo problema, consumiendo menos tiempo y energía. No obstante, los programadores deben encargarse de toda la gestión de los dispositivos, la distribución de la carga y la portabilidad del código entre sistemas, complicando notablemente su programación. Esta tesis ofrece contribuciones para mejorar el rendimiento y la eficiencia energética en estos sistemas masivamente paralelos. Se realizan propuestas que abordan objetivos generalmente contrapuestos: se mejora la usabilidad y la programabilidad, a la vez que se garantiza una mayor abstracción y extensibilidad del sistema, y al mismo tiempo se aumenta el rendimiento, la escalabilidad y la eficiencia energética. Para ello, se proponen dos motores de ejecución con enfoques completamente distintos. EngineCL, centrado en OpenCL y con una API de alto nivel, favorece la máxima compatibilidad entre todo tipo de dispositivos y proporciona un sistema modular extensible. Su versatilidad permite adaptarlo a entornos para los que no fue concebido, como aplicaciones con ejecuciones restringidas por tiempo o simuladores HPC de dinámica molecular, como el utilizado en un centro de investigación internacional. Considerando las tendencias industriales y enfatizando la aplicabilidad profesional, CoexecutorRuntime proporciona un sistema flexible centrado en C++/SYCL que dota de soporte a la co-ejecución a la tecnología oneAPI. Este runtime acerca a los programadores al dominio del problema, posibilitando la explotación de estrategias dinámicas adaptativas que mejoran la eficiencia en todo tipo de aplicaciones.ABSTRACT Heterogeneous systems are becoming increasingly relevant, due to their performance and energy efficiency capabilities, being present in all types of computing platforms, from embedded devices and servers to HPC nodes in large data centers. Their complexity implies that they are usually used under the task paradigm and the host-device programming model. This strongly penalizes accelerator utilization and system energy consumption, as well as making it difficult to adapt applications. Co-execution allows all devices to simultaneously compute the same problem, cooperating to consume less time and energy. However, programmers must handle all device management, workload distribution and code portability between systems, significantly complicating their programming. This thesis offers contributions to improve performance and energy efficiency in these massively parallel systems. The proposals address the following generally conflicting objectives: usability and programmability are improved, while ensuring enhanced system abstraction and extensibility, and at the same time performance, scalability and energy efficiency are increased. To achieve this, two runtime systems with completely different approaches are proposed. EngineCL, focused on OpenCL and with a high-level API, provides an extensible modular system and favors maximum compatibility between all types of devices. Its versatility allows it to be adapted to environments for which it was not originally designed, including applications with time-constrained executions or molecular dynamics HPC simulators, such as the one used in an international research center. Considering industrial trends and emphasizing professional applicability, CoexecutorRuntime provides a flexible C++/SYCL-based system that provides co-execution support for oneAPI technology. This runtime brings programmers closer to the problem domain, enabling the exploitation of dynamic adaptive strategies that improve efficiency in all types of applications.Funding: This PhD has been supported by the Spanish Ministry of Education (FPU16/03299 grant), the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and PID2019-105660RB-C22. This work has also been partially supported by the Mont-Blanc 3: European Scalable and Power Efficient HPC Platform based on Low-Power Embedded Technology project (G.A. No. 671697) from the European Union’s Horizon 2020 Research and Innovation Programme (H2020 Programme). Some activities have also been funded by the Spanish Science and Technology Commission under contract TIN2016-81840-REDT (CAPAP-H6 network). The Integration II: Hybrid programming models of Chapter 4 has been partially performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme. In particular, the author gratefully acknowledges the support of the SPMT Department of the High Performance Computing Center Stuttgart (HLRS)

    Processor-In-Memory (PIM) Based Architectures for PetaFlops Potential Massively Parallel Processing

    Get PDF
    The report summarizes the work performed at the University of Notre Dame under a NASA grant from July 15, 1995 through July 14, 1996. Researchers involved in the work included the PI, Dr. Peter M. Kogge, and three graduate students under his direction in the Computer Science and Engineering Department: Stephen Dartt, Costin Iancu, and Lakshmi Narayanaswany. The organization of this report is as follows. Section 2 is a summary of the problem addressed by this work. Section 3 is a summary of the project's objectives and approach. Section 4 summarizes PIM technology briefly. Section 5 overviews the main results of the work. Section 6 then discusses the importance of the results and future directions. Also attached to this report are copies of several technical reports and publications whose contents directly reflect results developed during this study

    A grid and cloud-based framework for high throughput bioinformatics

    Get PDF
    Recent advances in genome sequencing technologies have unleashed a flood of new data. As a result, the computational analysis of bioinformatics data sets has been rapidly moving from a labbased desktop computer environment to exhaustive analyses performed by large dedicated computing resources. Traditionally, large computational problems have been performed on dedicated clusters of high performance machines that are typically local to, and owned by, a particular institution. The current trend in Grid computing has seen institutions pooling their computational resources in order to offload excess computational work to remote locations during busy periods. In the last year or so, commercial Cloud computing initiatives have matured enough to offer a viable remote source of reliable computational power. Collections of idle desktop computers have also been used as a source of computational power in the form of ‘volunteer Grids’. The field of bioinformatics is highly dynamic, with new or updated versions of software tools and databases continually being developed. Several different tools and datasets must often be combined into a coherent, automated workflow or pipeline. While existing solutions are available for constructing workflows, there is a clear need for long-lived analyses consisting of many interconnected steps to be able to migrate among Grid and cloud computational resources dynamically. This project involved research into the principles underlying the design and architecture of flexible, high-throughput bioinformatics processes. Following extensive research into requirements gathering, a novel Grid-based platform, Microbase, has been implemented that is based on service-oriented architectures and peer-to-peer data transfer technology. This platform has been shown to be amenable to utilising a wide range of hardware from commodity desktop computers, to high-performance cloud infrastructure. The system has been shown to drastically reduce the bandwidth requirements of bioinformatics data distribution, and therefore reduces both the financial and computational costs associated with cloud computing. The system is inherently modular in nature, comprising a service based notification system, a data storage system scheduler and a job manager. In keeping with e-Science principles, each module can operate in physical isolation from each other, distributed within an intranet or Internet. Moreover, since each module is loosely coupled via Web services, modules have the potential to be used in combination with external service oriented components or in isolation as part of another system. In order to demonstrate the utility of such an open source system to the bioinformatics community, a pipeline of inter-connected bioinformatics applications was developed using the Microbase system to form a high throughput application for the comparative and visual analysis of microbial genomes. This application, Automated Genome Analyser (AGA) has been developed to operate without user interaction. AGA exposes its results via Web-services which can be used by further analytical stages within Microbase, by external computational resources via a Web service interface or which can be queried by users via an interactive genome browser. In addition to providing the necessary infrastructure for scalable Grid applications, a modular development framework has been provided, which simplifies the process of writing Grid applications. Microbase has been adopted by a number of projects ranging from comparative genomics to synthetic biology simulations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore