15 research outputs found

    Dynamic Configuration of CUDA Runtime Variables for CDP-based Divide-and-Conquer Algorithms

    Get PDF
    International audienceCUDA Dynamic Parallelism (CDP) is an extension of the GPGPU programming model proposed to better address irregular applications and recursive patterns of computation. However, processing memory demanding problems by using CDP is not straightforward, because of its particular memory organization. This work presents an algorithm to deal with such an issue. It dynamically calculates and configures the CDP runtime variables and the GPU heap on the basis of an analysis of the partial backtracking tree. The proposed algorithm was implemented for solving permutation combinatorial problems and experimented on two test-cases: N-Queens and the Asymmetric Travelling Salesman Problem. The proposed algorithm allows different CDP-based backtracking from the literature to solve memory demanding problems, adaptively with respect to the number of recursive kernel generations and the presence of dynamic allocations on GPU

    Grid and high performance computing applied to bioinformatics

    Get PDF
    Recent advances in genome sequencing technologies and modern biological data analysis technologies used in bioinformatics have led to a fast and continuous increase in biological data. The difficulty of managing the huge amounts of data currently available to researchers and the need to have results within a reasonable time have led to the use of distributed and parallel computing infrastructures for their analysis. In this context Grid computing has been successfully used. Grid computing is based on a distributed system which interconnects several computers and/or clusters to access global-scale resources. This infrastructure is exible, highly scalable and can achieve high performances with data-compute-intensive algorithms. Recently, bioinformatics is exploring new approaches based on the use of hardware accelerators, such as the Graphics Processing Units (GPUs). Initially developed as graphics cards, GPUs have been recently introduced for scientific purposes by rea- son of their performance per watt and the better cost/performance ratio achieved in terms of throughput and response time compared to other high-performance com- puting solutions. Although developers must have an in-depth knowledge of GPU programming and hardware to be effective, GPU accelerators have produced a lot of impressive results. The use of high-performance computing infrastructures raises the question of finding a way to parallelize the algorithms while limiting data dependency issues in order to accelerate computations on a massively parallel hardware. In this context, the research activity in this dissertation focused on the assessment and testing of the impact of these innovative high-performance computing technolo- gies on computational biology. In order to achieve high levels of parallelism and, in the final analysis, obtain high performances, some of the bioinformatic algorithms applicable to genome data analysis were selected, analyzed and implemented. These algorithms have been highly parallelized and optimized, thus maximizing the GPU hardware resources. The overall results show that the proposed parallel algorithms are highly performant, thus justifying the use of such technology. However, a software infrastructure for work ow management has been devised to provide support in CPU and GPU computation on a distributed GPU-based in- frastructure. Moreover, this software infrastructure allows a further coarse-grained data-parallel parallelization on more GPUs. Results show that the proposed appli- cation speed-up increases with the increase in the number of GPUs

    Grid and high performance computing applied to bioinformatics

    Get PDF
    Recent advances in genome sequencing technologies and modern biological data analysis technologies used in bioinformatics have led to a fast and continuous increase in biological data. The difficulty of managing the huge amounts of data currently available to researchers and the need to have results within a reasonable time have led to the use of distributed and parallel computing infrastructures for their analysis. In this context Grid computing has been successfully used. Grid computing is based on a distributed system which interconnects several computers and/or clusters to access global-scale resources. This infrastructure is exible, highly scalable and can achieve high performances with data-compute-intensive algorithms. Recently, bioinformatics is exploring new approaches based on the use of hardware accelerators, such as the Graphics Processing Units (GPUs). Initially developed as graphics cards, GPUs have been recently introduced for scientific purposes by rea- son of their performance per watt and the better cost/performance ratio achieved in terms of throughput and response time compared to other high-performance com- puting solutions. Although developers must have an in-depth knowledge of GPU programming and hardware to be effective, GPU accelerators have produced a lot of impressive results. The use of high-performance computing infrastructures raises the question of finding a way to parallelize the algorithms while limiting data dependency issues in order to accelerate computations on a massively parallel hardware. In this context, the research activity in this dissertation focused on the assessment and testing of the impact of these innovative high-performance computing technolo- gies on computational biology. In order to achieve high levels of parallelism and, in the final analysis, obtain high performances, some of the bioinformatic algorithms applicable to genome data analysis were selected, analyzed and implemented. These algorithms have been highly parallelized and optimized, thus maximizing the GPU hardware resources. The overall results show that the proposed parallel algorithms are highly performant, thus justifying the use of such technology. However, a software infrastructure for work ow management has been devised to provide support in CPU and GPU computation on a distributed GPU-based in- frastructure. Moreover, this software infrastructure allows a further coarse-grained data-parallel parallelization on more GPUs. Results show that the proposed appli- cation speed-up increases with the increase in the number of GPUs

    Techniques for optimizing dynamic parallelism on graphics processing units

    Get PDF
    Dynamic parallelism is a feature of general purpose graphics processing units (GPUs) whereby threads running on a GPU can spawn other threads without CPU intervention. This feature is useful for programming applications with nested parallelism where threads executing in parallel may each identify additional work that can itself be parallelized. Unfortunately, current GPU microarchitectures do not efficiently support using dynamic parallelism for accelerating applications with nested parallelism due to the high overhead of grid launches, the limited number of grids that can execute simultaneously, and the limited supported depth of the dynamic call stack. The compiler techniques presented herein improve the performance of applications with nested parallelism that use dynamic parallelism by mitigating the aforementioned microarchitectural limitations. Horizontal aggregation fuses grids launched by threads in the same warp, block, or grid into a single aggregated grid, thereby reducing the total number of grids launched and increasing the amount of work per grid to improve occupancy. Vertical aggregation fuses grids down the call stack with their descendant grids, again reducing the total number of grids launched but also reducing the depth of the call stack and removing grid launches from the application's critical path. Evaluation of these compiler techniques shows that they result in substantial performance improvement over regular dynamic parallelism for benchmarks representing common nested parallelism patterns. This observation has held true for multiple architecture generations, showing the continued relevance of these techniques. This work shows that to make dynamic parallelism practical for accelerating applications with nested parallelism, compiler transformations can be used to aggregate dynamically launched grids, thereby amortizing their launch overhead and improving their occupancy, without the need for additional hardware support

    Streaming Multi-core Sample-based Bayesian Analysis

    Get PDF
    Sequential Monte Carlo (SMC) methods are a well-established family of Bayesian inference algorithms for performing state estimation for Non-Linear Non-Gaussian models. As the models become more accurate, the run-time of SMC applications becomes increasingly slow. Parallel computing can be used to compensate for this side-effect. However, an efficient parallelisation of SMC is hard to achieve, due to the challenges involved in parallelising the bottleneck, resampling, and its constituent redistribute step. While redistribution can be performed in O((N/T) x logN) on a Shared Memory Architecture (SMA) using T parallel threads (e.g. a GPU or mainstream CPUs), a state-of-the-art redistribute takes O((logN)^2) computations on Distributed Memory Architectures (DMAs) which most supercomputers are made of. In this thesis, the focus is on three major goals. First, the thesis proposes a novel parallel redistribute for DMAs which achieves O(logN) time complexity. It is shown that on Message Passing Interface (MPI) the novel redistribute is up to eight times faster than the O((logN)^2) one. On a cluster of 256 cores, an SMC method employing the O((logN)^2) redistribute becomes up to six times faster when switching to the novel redistribution, which is also proved to no longer be the bottleneck. For the same number of cores, the maximum reported speed-up vs a single-core SMC method is 160. A patent application on this invention is currently filed. Second, the thesis describes a novel parallel redistribute for SMAs which takes O((N/T) + logN) steps and fully exploits the computational power of SMAs. The proposed approach is up to six times faster than the O((N/T) x logN) one. This shared memory implementation is then combined with the MPI O(logN) redistribution to obtain a hybrid distributed-shared memory parallel redistribute that fully exploits the large parallelism that modern supercomputers offer. In the end, to make these advances widely available this thesis presents Streaming-Stan and SMC-Stan, two extension packages for Stan, a popular statistical programming language. Streaming-Stan and SMC-Stan offer the possibility to describe models by using the same intuitive syntax used by regular Stan, but they are also equipped with the aforementioned High Performance Computing (HPC) SMC method, in the form of Fixed-Lag SMC and SMC sampler respectively. The same SMC methods also provide a vast choice of proposal distributions, including (on Streaming-Stan) two novel ones, presented in this thesis, which combine the main features of Fixed-Lag SMC methods with Hamiltonian Monte Carlo (HMC) and No-U-Turn Sampler (NUTS)

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers

    Dynamic Configuration of CUDA Runtime Variables for CDP-based Divide-and-Conquer Algorithms

    No full text
    International audienceCUDA Dynamic Parallelism (CDP) is an extension of the GPGPU programming model proposed to better address irregular applications and recursive patterns of computation. However, processing memory demanding problems by using CDP is not straightforward, because of its particular memory organization. This work presents an algorithm to deal with such an issue. It dynamically calculates and configures the CDP runtime variables and the GPU heap on the basis of an analysis of the partial backtracking tree. The proposed algorithm was implemented for solving permutation combinatorial problems and experimented on two test-cases: N-Queens and the Asymmetric Travelling Salesman Problem. The proposed algorithm allows different CDP-based backtracking from the literature to solve memory demanding problems, adaptively with respect to the number of recursive kernel generations and the presence of dynamic allocations on GPU

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Actas de las XIV Jornadas de Ingeniería Telemática (JITEL 2019) Zaragoza (España) 22-24 de octubre de 2019

    Get PDF
    En esta ocasión, es la ciudad de Zaragoza la encargada de servir de anfitriona a las XIV Jornadas de Ingeniería Telemática (JITEL 2019), que se celebrarán del 22 al 24 de octubre de 2019. Las Jornadas de Ingeniería Telemática (JITEL), organizadas por la Asociación de Telemática (ATEL), constituyen un foro propicio de reunión, debate y divulgación para los grupos que imparten docencia e investigan en temas relacionados con las redes y los servicios telemáticos. Con la organización de este evento se pretende fomentar, por un lado el intercambio de experiencias y resultados, además de la comunicación y cooperación entre los grupos de investigación que trabajan en temas relacionados con la telemática. En paralelo a las tradicionales sesiones que caracterizan los congresos científicos, se desea potenciar actividades más abiertas, que estimulen el intercambio de ideas entre los investigadores experimentados y los noveles, así como la creación de vínculos y puntos de encuentro entre los diferentes grupos o equipos de investigación. Para ello, además de invitar a personas relevantes en los campos correspondientes, se van a incluir sesiones de presentación y debate de las líneas y proyectos activos de los mencionados equipos
    corecore