3 research outputs found

    Hybrid Vector Library-From Memory Bound to Compute Bound with NVVM

    Get PDF
    International audienceExisting source code usually interleaves data management, error-checking, text processing and actual compute. On general purpose processors, this mixture of code tasks is not necessarily an issue, and performance levels are often satisfactory as is. However, when trying to use GPU, this hybrid computing turns into a coding challenge. Each individual computing tasks does not show sufficient workload, and porting the whole application requires a significant investment in the software asset. We propose an alternate approach with runtime compilation based on function calls on a compute library. Hybrid Vector Library operates on vectors, in a manner similar to BLAS level 1 routines, with other functions such as square root or exponential, or MKL routines. In essence, all operations are performed on a vector of values. We illustrate the performance results of this approach on a typical financial benchmark.Existing solutions such as ArrayFire do not allow custom device function to be called in the middle of a level 1 routines sequence. We address that issue by also processing these functions. We follow the call graph from the main compute routine, and generate cubin files for user-defined device functions. These functions are then linked at runtime to the hvl calls sequence, and usually generate a JCAL instruction in SASS, in a similar way to sqrt.Our approach gives similar benefits to user's code as ArrayFire, with the flexibility of custom device functions

    Using CLANG/LLVM Vectorization to Generate Mixed Precision Source Code

    Get PDF
    International audienc

    Lancer de rayons dans un octree.: Utilisation de la connectivité via les coordonnées de Plücker

    No full text
    Most of the computation time in ray-tracing algorithms is spent traversing the accelerating structure. Usual bottlenecks are memory latency and branching. We present here an octree traversal algorithm which offers the best of both octrees and regular grid. This algorithm mostly uses the connectivity of the tree's leafs. Using the Plücker coordinates, it's possible to implement it in a way which keeps memory accesses coherent, while drastically limiting branching operations.L'essentiel du temps de calcul des systèmes de tracé de rayons est passé dans le parcours de la structure accélératrice. Les facteurs limitants de ce parcours sont la latence mémoire et le branchement. Nous proposons ici un algorithme de traversée d'octree alliant les bénéfices de l'arbre et ceux de la grille régulière. Celui-ci est basé sur l'utilisation de la connectivité entre les feuilles de l'arbre. Via les coordonnées de Plücker, il est possible de l'implémenter d'une façon qui maintienne un certaine cohérence des données, tout en limitant drastiquement le nombre de branchements
    corecore