37 research outputs found

    The TheLMA project: Multi-GPU Implementation of the Lattice Boltzmann Method

    Get PDF
    International audienceIn this paper, we describe the implementation of a multi-graphical processing unit (GPU) fluid flow solver based on the lattice Boltzmann method (LBM). The LBM is a novel approach in computational fluid dynamics, with numerous interesting features from a computational, numerical, and physical standpoint. Our program is based on CUDA and uses POSIX threads to manage multiple computation devices. Using recently released hardware, our solver may therefore run eight GPUs in parallel, which allows us to perform simulations at a rather large scale. Performance and scalability are excellent, the speedup over sequential implementations being at least of two orders of magnitude. In addition, we discuss tiling and communication issues for present and forthcoming implementations

    A reduced-reference perceptual image and video quality metric based on edge preservation

    Get PDF
    In image and video compression and transmission, it is important to rely on an objective image/video quality metric which accurately represents the subjective quality of processed images and video sequences. In some scenarios, it is also important to evaluate the quality of the received video sequence with minimal reference to the transmitted one. For instance, for quality improvement of video transmission through closed-loop optimisation, the video quality measure can be evaluated at the receiver and provided as feedback information to the system controller. The original image/video sequence-prior to compression and transmission-is not usually available at the receiver side, and it is important to rely at the receiver side on an objective video quality metric that does not need reference or needs minimal reference to the original video sequence. The observation that the human eye is very sensitive to edge and contour information of an image underpins the proposal of our reduced reference (RR) quality metric, which compares edge information between the distorted and the original image. Results highlight that the metric correlates well with subjective observations, also in comparison with commonly used full-reference metrics and with a state-of-the-art RR metric. © 2012 Martini et al

    The All-Data-Based Evolutionary Hypothesis of Ciliated Protists with a Revised Classification of the Phylum Ciliophora (Eukaryota, Alveolata)

    Get PDF
    This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The file attached is the published version of the article

    Performance Study of LU Factorization with Low Communication Overhead on Multiprocessors

    No full text
    In this paper, we make efficient use of asynchronous communications on the LU decomposition algorithm with pivoting and a column-scattered data decomposition to derive precise computational complexities. We then compare these results with experiments on the Intel iPSC/860 and Paragon machines and show that very good performances can be obtained on a ring with asynchronous communications

    Performance complexity of LU factorization with efficient pipelining and overlap on a multiprocessor

    No full text
    In this paper, we make efficient use of pipelining on LU decomposition with pivoting and a column-scattered data decomposition to derive precise variations of the computational complexities. We then compare these results with experiments on the Intel iPSC/860 and Paragon machines

    © World Scientific Publishing Company ON THE CONVERGENCE OF COMPUTATIONAL AND DATA GRIDS

    No full text
    Great advances in high-performance computing have given rise to scientific applications that place large demands on software and hardware infrastructures for both computational and data services. With these trends the necessity has emerged for distributed systems developers that once distinguished between these elements to acknowledge that indeed computational and data services are tightly coupled and need to be addressed simultaneously. In this article, we compile and discuss several strategies and techniques, like co-scheduling and co-allocation of computational and data services, dynamic storage capabilities, and quality-of-service, that can be used to help resolve some of the aforementioned issues. We present our interactions with a distributed computing system, NetSolve, and a Distributed Storage Infrastructure, IBP, as a case study of how some of these techniques can be effectively deployed and offer experimental evidence from early prototypes that validate our motivation and direction
    corecore