50 research outputs found
IBM Power™ 8 experiments
International audienc
Altimesh Hybridizer
International audienc
How Pascal And Power 8 Will Accelerate Counterparty Risk Calculations
International audienc
Using CLANG/LLVM Vectorization to Generate Mixed Precision Source Code
International audienc
Hybrid Vector Library-From Memory Bound to Compute Bound with NVVM
International audienceExisting source code usually interleaves data management, error-checking, text processing and actual compute. On general purpose processors, this mixture of code tasks is not necessarily an issue, and performance levels are often satisfactory as is. However, when trying to use GPU, this hybrid computing turns into a coding challenge. Each individual computing tasks does not show sufficient workload, and porting the whole application requires a significant investment in the software asset. We propose an alternate approach with runtime compilation based on function calls on a compute library. Hybrid Vector Library operates on vectors, in a manner similar to BLAS level 1 routines, with other functions such as square root or exponential, or MKL routines. In essence, all operations are performed on a vector of values. We illustrate the performance results of this approach on a typical financial benchmark.Existing solutions such as ArrayFire do not allow custom device function to be called in the middle of a level 1 routines sequence. We address that issue by also processing these functions. We follow the call graph from the main compute routine, and generate cubin files for user-defined device functions. These functions are then linked at runtime to the hvl calls sequence, and usually generate a JCAL instruction in SASS, in a similar way to sqrt.Our approach gives similar benefits to user's code as ArrayFire, with the flexibility of custom device functions
Image Processing in Java Running on GPU
International audienc
Altimesh Hybridizer™ Enabling Accelerators in .Net and more
International audienc
Shadow Computations using Robust Epsilon Visibility
Analytic visibility algorithms, for example methods which compute a subdivided mesh to represent shadows, are notoriously unrobust and hard to use in practice. We present a new method based on a generalized definition of extremal stabbing lines, which are the extremities of shadow boundaries. We treat scenes containing multiple edges or vertices in degenerate configurations, (e.g., collinear or coplanar). We introduce a robust epsilon method to determine whether each generalized extremal stabbing line is blocked, or is touched by these scene elements, and thus added to the line's generators. We develop robust blocker predicates for polygons which are smaller than epsilon. For larger values, small shadow features merge and eventually disappear. We can thus robustly connect generalized extremal stabbing lines in degenerate scenes to form shadow boundaries. We show that our approach is consistent, and that shadow boundary connectivity is preserved when features merge. We have implemented our algorithm, and show that we can robustly compute analytic shadow boundaries to the precision of our chosen epsilon threshold for non-trivial models, containing numerous degeneracies
Flexible Point-Based Rendering on Mobile Devices
Point-based rendering is a compact and efficient means of displayingcomplex geometry. For mobile devices which typically have limited CPU orfloating point speed, limited memory, no graphics hardware and a smalldisplay, a hierarchical packed point based representation of objectsis particularly well adapted. We introduce -grids, which are ageneralization of previous octree based representations and analyse theirmemory and rendering efficiency. By storing intermediate node attributes,our structure allows flexible rendering, permitting efficient local imagerefinement, required for example when zooming into very complex scenes.We also introduce a novel and efficient one-pass shadow mapping algorithm usingthis data structure. We show an implementation of our method on a PDA,which can render objects sampled by 1.3 million points at 2.1 frames per second;the model was originally made up of 4.7 million polygons