21 research outputs found

    PySke: Algorithmic Skeletons for Python

    Get PDF
    International audiencePySke is a library of parallel algorithmic skeletons in Python designed for list and tree data structures. Such algorithmic skeletons are high-order functions implemented in parallel. An application developed with PySke is a composition of skeletons. To ease the write of parallel programs, PySke does not follow the Single Program Multiple Data (SPMD) paradigm but offers a global view of parallel programs to users. This approach aims at writing scalable programs easily. In addition to the library, we present experiments performed on a high-performance computing cluster (distributed memory) on a set of example applications developed with PySke

    Optimal program variant generation for hybrid manycore systems

    Get PDF
    Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked. In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications

    Programming models for mobile environments

    Get PDF
    Premi extraordinari doctorat UPC curs 2017-2018. Àmbit d’Enginyeria de les TICFor the last decade, mobile devices have grown in popularity and became the best-selling computing devices. Despite their high capabilities for user interactions and network connectivity, the computing power of mobile devices is low and the lifetime of the application running on them limited by the battery. Mobile Cloud Computing (MCC) is a technology that tackles the limitations of mobile devices by bringing together their mobility with the vast computing power of the Cloud. Programming applications for Mobile Cloud Computing (MCC) environments is not as straightforward as coding monolithic applications. Developers have to deal with the issues related to parallel programming for distributed infrastructures while considering the battery lifetime and the variability of the network produced by the high mobility of this kind of devices. As with any other distributed environment, developers turn to programming models to improve their productivity by avoiding the complexity of manually dealing with these issues and delegate on the corresponding model all the management of these concerns. This thesis contributes to the current state of the art with an adaptation of the COMPSs programming model for MCC environments. COMPSs allows application programmers to code their applications in a sequential, infrastructure-agnostic fashion without calls to any COMPSs-specific API using the native language for the target platform as if they were to run on the mobile device. At execution time, a runtime system automatically partitions the application into tasks and orchestrates their execution on top of the available resources. This thesis contributes with an extension to the programming model to allow task polymorphism and let the runtime exploit computational resources other than the CPU of the resources. Besides, the runtime architecture has been redesigned with the characteristics of MCC in mind, and it runs as a common service which all the applications running simultaneously on the mobile device contact for submitting the execution of their tasks. For collaboratively exploiting both, local and remote resources, the runtime clusters the computational devices into Computing Platforms according to the mechanisms required to provide the processing elements with the necessary input values, launch the task execution avoiding resource oversubscription and fetching the results back from them. The CPU Platform run tasks on the cores of the CPU. The GPU Platform leverages on OpenCL to run tasks as kernels on GPUs or other accelerators embedded in the mobile device. Finally, the Cloud Platform offloads the execution of tasks onto remote resources. To holistically decide whether is worth running a task on embedded or on remote resources, the runtime considers the the costs -- time, energy and money -- of running the computation on each of the platforms and picks the best. Each platform manages internally its resources and orchestrates the execution of tasks on them using different scheduling policies. Using local and remote computing devices forces the runtime to share data values among the nodes of the infrastructure. This data is potentially privacy-sensitive, and the runtime exposes it to possible attackers when transferring it through the network. To protect the application user from data leaks, the runtime has to provide communications with secrecy, integrity and authenticity. In the extreme case of a network breakdown that isolates the mobile device from the remote nodes, the runtime has to ensure that the execution continues to provide the application user with the expected result even if the connection never re-establishes. The mobile device has to respond using only the resources embedded in it, what could incur in the re-execution of computations already ran on the remote resources. Remote workers have to continue with the execution so that, in case of reconnection, both parts synchronize its progress to reduce the impact of the disruption.Els últims anys, els dispositius mòbils han guanyat en popularitat i s'han convertit en els dispositius més venuts. Tot i la connectivitat i la bona interacció amb l'usuari que ofereixen, la seva capacitat de càlcul is baixa i limitada per la vida de la bateria. El Mobile Cloud Computing (MCC) és una tecnologia que soluciona les limitacions d'aquests dispositius ajuntant la seva mobilitat amb la gran capacitat de còmput del Cloud. Programar aplicacions per entorns MCC no és tan directe com fer aplicacions monolítiques. Els desenvolupadors han de tractar amb els problemes relacionats amb la programació paral·lela mentre tenen en compte la duració de la bateria i la variabilitat de la xarxa degut a la mobilitat inherent a aquest tipus de dispositius. Com per qualsevol altre entorn distribuït, els desenvolupadors recorren a models de programació que millorin la seva productivitat i els evitin tractar manualment amb aquests problemes delegant la seva gestió en el model. Aquesta tesis contribueix a l'estat de l'art actual amb una adaptació del model de programació COMPSs als entorns MCC. COMPSs permet als desenvolupadors programar les aplicacions de forma agnòstica a la infraestructura i seqüencial sense necessitat d'invocar cap API específica utilitzant el llenguatge natiu de la platforma com si l'aplicació s'executés directament en el mòbil. En temps d'execució, una eina (runtime) automàticament divideix l'aplicació en tasques i n'orquestra la seva execució sobre els recursos disponibles. Aquesta tesis estèn el model de programació per tal de permetre polimorfisme a nivell de tasca i deixar al runtime explotar els recursos computacionals dels que disposa el mòbil a part de la CPU. A més a més, l'arquitectura del runtime s'ha redissenyat tenint en compte les característiques pròpies del MCC, i aquest s'executa com un servei comú al que totes les aplicacions del mòbil contacten per tal d'executar les seves tasques. Per explotar col·laborativament tots els recursos, locals i remots, el runtime agrupa els recursos en Computing Platforms en funció dels mecanismes necessaris per proveir el recurs amb les dades d'entrada necessàries, llançar l'execució i recuperar-ne els resultats. La CPU Platform executa tasques en els nuclis de la CPU. La GPU Platform utilitza OpenCL per executar tasques en forma de kernels a la GPU o altres acceleradors integrats en el mòbil. Finalment, la Cloud Platform descàrrega l'execució de tasques en recursos remots. Per decidir holisticament si és millor executar una tasca en un recurs local o en un remot, el runtime considera els costs (temporal, energètic econòmic) d'executar la tasca en cada una de les plataformes i n'escull la millor. Cada plataforma gestiona internament els seus recursos i orquestra l'execució de les tasques en ells seguint diferents polítiques de planificació. L'ús de recursos locals i remots força la compartició de dades entre els nodes de la infraestructura. Aquestes dades són potencialment sensibles i de caràcter privat i el runtime les exposa a possibles atacs que les transfereix per la xarxa. Per tal de protegir l'usuari de possibles fuites de dades, el runtime ha de dotar les comunicacions amb confidencialitat, integritat i autenticitat. En el cas extrem en que un error de xarxa aïlli el dispositiu mòbil dels nodes remots, el runtime ha d'assegurar que l'execució continua i que eventualment l'usuari rebrà el resultat esperat fins i tot en cas de que la connexió no és restableixi mai. El mòbil ha de ser capaç d'executar l'aplicació utilitzant únicament les dades i recursos disponibles en aquell moment, la qual cosa pot forçar la re-execució d'algunes tasques ja calculades en els recursos remots. Els recursos remots han de continuar l'execució per tal que en cas de reconnexió, ambdues parts sincronitzin el seu progrés i es minimitzi l'impacte de la desconnexió.Award-winningPostprint (published version

    Programming models for mobile environments

    Get PDF
    For the last decade, mobile devices have grown in popularity and became the best-selling computing devices. Despite their high capabilities for user interactions and network connectivity, the computing power of mobile devices is low and the lifetime of the application running on them limited by the battery. Mobile Cloud Computing (MCC) is a technology that tackles the limitations of mobile devices by bringing together their mobility with the vast computing power of the Cloud. Programming applications for Mobile Cloud Computing (MCC) environments is not as straightforward as coding monolithic applications. Developers have to deal with the issues related to parallel programming for distributed infrastructures while considering the battery lifetime and the variability of the network produced by the high mobility of this kind of devices. As with any other distributed environment, developers turn to programming models to improve their productivity by avoiding the complexity of manually dealing with these issues and delegate on the corresponding model all the management of these concerns. This thesis contributes to the current state of the art with an adaptation of the COMPSs programming model for MCC environments. COMPSs allows application programmers to code their applications in a sequential, infrastructure-agnostic fashion without calls to any COMPSs-specific API using the native language for the target platform as if they were to run on the mobile device. At execution time, a runtime system automatically partitions the application into tasks and orchestrates their execution on top of the available resources. This thesis contributes with an extension to the programming model to allow task polymorphism and let the runtime exploit computational resources other than the CPU of the resources. Besides, the runtime architecture has been redesigned with the characteristics of MCC in mind, and it runs as a common service which all the applications running simultaneously on the mobile device contact for submitting the execution of their tasks. For collaboratively exploiting both, local and remote resources, the runtime clusters the computational devices into Computing Platforms according to the mechanisms required to provide the processing elements with the necessary input values, launch the task execution avoiding resource oversubscription and fetching the results back from them. The CPU Platform run tasks on the cores of the CPU. The GPU Platform leverages on OpenCL to run tasks as kernels on GPUs or other accelerators embedded in the mobile device. Finally, the Cloud Platform offloads the execution of tasks onto remote resources. To holistically decide whether is worth running a task on embedded or on remote resources, the runtime considers the the costs -- time, energy and money -- of running the computation on each of the platforms and picks the best. Each platform manages internally its resources and orchestrates the execution of tasks on them using different scheduling policies. Using local and remote computing devices forces the runtime to share data values among the nodes of the infrastructure. This data is potentially privacy-sensitive, and the runtime exposes it to possible attackers when transferring it through the network. To protect the application user from data leaks, the runtime has to provide communications with secrecy, integrity and authenticity. In the extreme case of a network breakdown that isolates the mobile device from the remote nodes, the runtime has to ensure that the execution continues to provide the application user with the expected result even if the connection never re-establishes. The mobile device has to respond using only the resources embedded in it, what could incur in the re-execution of computations already ran on the remote resources. Remote workers have to continue with the execution so that, in case of reconnection, both parts synchronize its progress to reduce the impact of the disruption.Els últims anys, els dispositius mòbils han guanyat en popularitat i s'han convertit en els dispositius més venuts. Tot i la connectivitat i la bona interacció amb l'usuari que ofereixen, la seva capacitat de càlcul is baixa i limitada per la vida de la bateria. El Mobile Cloud Computing (MCC) és una tecnologia que soluciona les limitacions d'aquests dispositius ajuntant la seva mobilitat amb la gran capacitat de còmput del Cloud. Programar aplicacions per entorns MCC no és tan directe com fer aplicacions monolítiques. Els desenvolupadors han de tractar amb els problemes relacionats amb la programació paral·lela mentre tenen en compte la duració de la bateria i la variabilitat de la xarxa degut a la mobilitat inherent a aquest tipus de dispositius. Com per qualsevol altre entorn distribuït, els desenvolupadors recorren a models de programació que millorin la seva productivitat i els evitin tractar manualment amb aquests problemes delegant la seva gestió en el model. Aquesta tesis contribueix a l'estat de l'art actual amb una adaptació del model de programació COMPSs als entorns MCC. COMPSs permet als desenvolupadors programar les aplicacions de forma agnòstica a la infraestructura i seqüencial sense necessitat d'invocar cap API específica utilitzant el llenguatge natiu de la platforma com si l'aplicació s'executés directament en el mòbil. En temps d'execució, una eina (runtime) automàticament divideix l'aplicació en tasques i n'orquestra la seva execució sobre els recursos disponibles. Aquesta tesis estèn el model de programació per tal de permetre polimorfisme a nivell de tasca i deixar al runtime explotar els recursos computacionals dels que disposa el mòbil a part de la CPU. A més a més, l'arquitectura del runtime s'ha redissenyat tenint en compte les característiques pròpies del MCC, i aquest s'executa com un servei comú al que totes les aplicacions del mòbil contacten per tal d'executar les seves tasques. Per explotar col·laborativament tots els recursos, locals i remots, el runtime agrupa els recursos en Computing Platforms en funció dels mecanismes necessaris per proveir el recurs amb les dades d'entrada necessàries, llançar l'execució i recuperar-ne els resultats. La CPU Platform executa tasques en els nuclis de la CPU. La GPU Platform utilitza OpenCL per executar tasques en forma de kernels a la GPU o altres acceleradors integrats en el mòbil. Finalment, la Cloud Platform descàrrega l'execució de tasques en recursos remots. Per decidir holisticament si és millor executar una tasca en un recurs local o en un remot, el runtime considera els costs (temporal, energètic econòmic) d'executar la tasca en cada una de les plataformes i n'escull la millor. Cada plataforma gestiona internament els seus recursos i orquestra l'execució de les tasques en ells seguint diferents polítiques de planificació. L'ús de recursos locals i remots força la compartició de dades entre els nodes de la infraestructura. Aquestes dades són potencialment sensibles i de caràcter privat i el runtime les exposa a possibles atacs que les transfereix per la xarxa. Per tal de protegir l'usuari de possibles fuites de dades, el runtime ha de dotar les comunicacions amb confidencialitat, integritat i autenticitat. En el cas extrem en que un error de xarxa aïlli el dispositiu mòbil dels nodes remots, el runtime ha d'assegurar que l'execució continua i que eventualment l'usuari rebrà el resultat esperat fins i tot en cas de que la connexió no és restableixi mai. El mòbil ha de ser capaç d'executar l'aplicació utilitzant únicament les dades i recursos disponibles en aquell moment, la qual cosa pot forçar la re-execució d'algunes tasques ja calculades en els recursos remots. Els recursos remots han de continuar l'execució per tal que en cas de reconnexió, ambdues parts sincronitzin el seu progrés i es minimitzi l'impacte de la desconnexió

    Vers une arithmétique efficace pour le chiffrement homomorphe basé sur le Ring-LWE

    Get PDF
    Fully homomorphic encryption is a kind of encryption offering the ability to manipulate encrypted data directly through their ciphertexts. In this way it is possible to process sensitive data without having to decrypt them beforehand, ensuring therefore the datas' confidentiality. At the numeric and cloud computing era this kind of encryption has the potential to considerably enhance privacy protection. However, because of its recent discovery by Gentry in 2009, we do not have enough hindsight about it yet. Therefore several uncertainties remain, in particular concerning its security and efficiency in practice, and should be clarified before an eventual widespread use. This thesis deals with this issue and focus on performance enhancement of this kind of encryption in practice. In this perspective we have been interested in the optimization of the arithmetic used by these schemes, either the arithmetic underlying the Ring Learning With Errors problem on which the security of these schemes is based on, or the arithmetic specific to the computations required by the procedures of some of these schemes. We have also considered the optimization of the computations required by some specific applications of homomorphic encryption, and in particular for the classification of private data, and we propose methods and innovative technics in order to perform these computations efficiently. We illustrate the efficiency of our different methods through different software implementations and comparisons to the related art.Le chiffrement totalement homomorphe est un type de chiffrement qui permet de manipuler directement des données chiffrées. De cette manière, il est possible de traiter des données sensibles sans avoir à les déchiffrer au préalable, permettant ainsi de préserver la confidentialité des données traitées. À l'époque du numérique à outrance et du "cloud computing" ce genre de chiffrement a le potentiel pour impacter considérablement la protection de la vie privée. Cependant, du fait de sa découverte récente par Gentry en 2009, nous manquons encore de recul à son propos. C'est pourquoi de nombreuses incertitudes demeurent, notamment concernant sa sécurité et son efficacité en pratique, et devront être éclaircies avant une éventuelle utilisation à large échelle.Cette thèse s'inscrit dans cette problématique et se concentre sur l'amélioration des performances de ce genre de chiffrement en pratique. Pour cela nous nous sommes intéressés à l'optimisation de l'arithmétique utilisée par ces schémas, qu'elle soit sous-jacente au problème du "Ring-Learning With Errors" sur lequel la sécurité des schémas considérés est basée, ou bien spécifique aux procédures de calculs requises par certains de ces schémas. Nous considérons également l'optimisation des calculs nécessaires à certaines applications possibles du chiffrement homomorphe, et en particulier la classification de données privées, de sorte à proposer des techniques de calculs innovantes ainsi que des méthodes pour effectuer ces calculs de manière efficace. L'efficacité de nos différentes méthodes est illustrée à travers des implémentations logicielles et des comparaisons aux techniques de l'état de l'art

    Harnessing the power of GPUs for problems in real algebraic geometry

    Get PDF
    This thesis presents novel parallel algorithms to leverage the power of GPUs (Graphics Processing Units) for exact computations with polynomials having large integer coefficients. The significance of such computations, especially in real algebraic geometry, is hard to undermine. On massively-parallel architectures such as GPU, the degree of datalevel parallelism exposed by an algorithm is the main performance factor. We attain high efficiency through the use of structured matrix theory to assist the realization of relevant operations on polynomials on the graphics hardware. A detailed complexity analysis, assuming the PRAM model, also confirms that our approach achieves a substantially better parallel complexity in comparison to classical algorithms used for symbolic computations. Aside from the theoretical considerations, a large portion of this work is dedicated to the actual algorithm development and optimization techniques where we pay close attention to the specifics of the graphics hardware. As a byproduct of this work, we have developed high-throughput modular arithmetic which we expect to be useful for other GPU applications, in particular, open-key cryptography. We further discuss the algorithms for the solution of a system of polynomial equations, topology computation of algebraic curves and curve visualization which can profit to the full extent from the GPU acceleration. Extensive benchmarking on a real data demonstrates the superiority of our algorithms over several state-of-the-art approaches available to date. This thesis is written in English.Diese Arbeit beschäftigt sich mit neuen parallelen Algorithmen, die das Leistungspotenzial der Grafik-Prozessoren (GPUs) zur exakten Berechnungen mit ganzzahlige Polynomen nutzen. Solche symbolische Berechnungen sind von großer Bedeutung zur Lösung vieler Probleme aus der reellen algebraischen Geometrie. Für die effziente Implementierung eines Algorithmus auf massiv-parallelen Hardwarearchitekturen, wie z.B. GPU, ist vor allem auf eine hohe Datenparallelität zu achten. Unter Verwendung von Ergebnissen aus der strukturierten Matrix-Theorie konnten wir die entsprechenden Operationen mit Polynomen auf der Grafikkarte leicht übertragen. Außerdem zeigt eine Komplexitätanalyse im PRAM-Rechenmodell, dass die von uns entwickelten Verfahren eine deutlich bessere Komplexität aufweisen als dies für die klassischen Verfahren der Fall ist. Neben dem theoretischen Ergebnis liegt ein weiterer Schwerpunkt dieser Arbeit in der praktischen Implementierung der betrachteten Algorithmen, wobei wir auf der Besonderheiten der Grafikhardware achten. Im Rahmen dieser Arbeit haben wir hocheffiziente modulare Arithmetik entwickelt, von der wir erwarten, dass sie sich für andere GPU Anwendungen, insbesondere der Public-Key-Kryptographie, als nützlich erweisen wird. Darüber hinaus betrachten wir Algorithmen für die Lösung eines Systems von Polynomgleichungen, Topologie Berechnung der algebraischen Kurven und deren Visualisierung welche in vollem Umfang von der GPU-Leistung profitieren können. Zahlreiche Experimente belegen dass wir zur Zeit die beste Verfahren zur Verfügung stellen. Diese Dissertation ist in englischer Sprache verfasst

    Homomorphic Encryption for Machine Learning in Medicine and Bioinformatics

    Get PDF
    Machine learning techniques are an excellent tool for the medical community to analyzing large amounts of medical and genomic data. On the other hand, ethical concerns and privacy regulations prevent the free sharing of this data. Encryption methods such as fully homomorphic encryption (FHE) provide a method evaluate over encrypted data. Using FHE, machine learning models such as deep learning, decision trees, and naive Bayes have been implemented for private prediction using medical data. FHE has also been shown to enable secure genomic algorithms, such as paternity testing, and secure application of genome-wide association studies. This survey provides an overview of fully homomorphic encryption and its applications in medicine and bioinformatics. The high-level concepts behind FHE and its history are introduced. Details on current open-source implementations are provided, as is the state of FHE for privacy-preserving techniques in machine learning and bioinformatics and future growth opportunities for FHE

    Finding parallel functional pearls : automatic parallel recursion scheme detection in Haskell functions via anti-unification

    Get PDF
    This work has been partially supported by the EU H2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications–a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation in Science and Technology) , by EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1), and by Scottish Enterprise PS7305CA44.This paper describes a new technique for identifying potentially parallelisable code structures in functional programs. Higher-order functions enable simple and easily understood abstractions that can be used to implement a variety of common recursion schemes, such as maps and folds over traversable data structures. Many of these recursion schemes have natural parallel implementations in the form of algorithmic skeletons. This paper presents a technique that detects instances of potentially parallelisable recursion schemes in Haskell 98 functions. Unusually, we exploit anti-unification to expose these recursion schemes from source-level definitions whose structures match a recursion scheme, but which are not necessarily written directly in terms of maps, folds, etc. This allows us to automatically introduce parallelism, without requiring the programmer to structure their code a priori in terms of specific higher-order functions. We have implemented our approach in the Haskell refactoring tool, HaRe, and demonstrated its use on a range of common benchmarking examples. Using our technique, we show that recursion schemes can be easily detected, that parallel implementations can be easily introduced, and that we can achieve real parallel speedups (up to 23 . 79 × the sequential performance on 28 physical cores, or 32 . 93 × the sequential performance with hyper-threading enabled).PostprintPeer reviewe
    corecore