70 research outputs found

    Exploiting Graphics Processing Units for Massively Parallel Multi-Dimensional Indexing

    Get PDF
    Department of Computer EngineeringScientific applications process truly large amounts of multi-dimensional datasets. To efficiently navigate such datasets, various multi-dimensional indexing structures, such as the R-tree, have been extensively studied for the past couple of decades. Since the GPU has emerged as a new cost-effective performance accelerator, now it is common to leverage the massive parallelism of the GPU in various applications such as medical image processing, computational chemistry, and particle physics. However, hierarchical multi-dimensional indexing structures are inherently not well suited for parallel processing because their irregular memory access patterns make it difficult to exploit massive parallelism. Moreover, recursive tree traversal often fails due to the small run-time stack and cache memory in the GPU. First, we propose Massively Parallel Three-phase Scanning (MPTS) R-tree traversal algorithm to avoid the irregular memory access patterns and recursive tree traversal so that the GPU can access tree nodes in a sequential manner. The experimental study shows that MPTS R-tree traversal algorithm consistently outperforms traditional recursive R-Tree search algorithm for multi-dimensional range query processing. Next, we focus on reducing the query response time and extending n-ary multi-dimensional indexing structures - R-tree, so that a large number of GPU threads cooperate to process a single query in parallel. Because the number of submitted concurrent queries in scientific data analysis applications is relatively smaller than that of enterprise database systems and ray tracing in computer graphics. Hence, we propose a novel variant of R-trees Massively Parallel Hilbert R-Tree (MPHR-Tree), which is designed for a novel parallel tree traversal algorithm Massively Parallel Restart Scanning (MPRS). The MPRS algorithm traverses the MPHR-Tree in mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm. Our extensive experimental results show that the MPRS algorithm outperforms the other stackless tree traversal algorithms, which are designed for efficient ray tracing in computer graphics community. Furthermore, we develop query co-processing scheme that makes use of both the CPU and GPU. In this approach, we store the internal and leaf nodes of upper tree in CPU host memory and GPU device memory, respectively. We let the CPU traverse internal nodes because the conditional branches in hierarchical tree structures often cause a serious warp divergence problem in the GPU. For leaf nodes, the GPU scans a large number of leaf nodes in parallel based on the selection ratio of a given range query. It is well known that the GPU is superior to the CPU for parallel scanning. The experimental results show that our proposed multi-dimensional range query co-processing scheme improves the query response time by up to 12x and query throughput by up to 4x compared to the state-of-the-art GPU tree traversal algorithm.ope

    Algorithms and data structures for interactive ray tracing on commodity hardware

    Get PDF
    Rendering methods based on ray tracing provide high image realism, but have been historically regarded as offline only. This has changed in the past decade, due to significant advances in the construction and traversal performance of acceleration structures and the efficient use of data-parallel processing. Today, all major graphics companies offer real-time ray tracing solutions. The following work has contributed to this development with some key insights. We first address the limited support of dynamic scenes in previous work, by proposing two new parallel-friendly construction algorithms for KD-trees and BVHs. By approximating the cost function, we accelerate construction by up to an order of magnitude (especially for BVHs), at the expense of only tiny degradation to traversal performance. For the static portions of the scene, we also address the topic of creating the “perfect” acceleration structure. We develop a polynomial time non-greedy BVH construction algorithm. We then modify it to produce a new type of acceleration structure that inherits both the high performance of KD-trees and the small size of BVHs. Finally, we focus on bringing real-time ray tracing to commodity desktop computers. We develop several new KD-tree and BVH traversal algorithms specifically tailored for the GPU. With them, we show for the first time that GPU ray tracing is indeed feasible, and it can outperform CPU ray tracing by almost an order of magnitude, even on large CAD models.Ray-Tracing basierte Bildsynthese-Verfahren bieten einen hohen Grad an Realismus, wurden allerdings in der Vergangenheit ausschließlich als nicht echtzeitfĂ€hig betrachtet. Dies hat sich innerhalb des letzten Jahrzehnts geĂ€ndert durch signifikante Fortschritte sowohl im Bereich der Erstellung und Traversierung von Beschleunigungs-Strukturen, als auch im effizienten Einsatz paralleler Berechnung. Heute bieten alle großen Grafik-Firmen Echtzeit-Ray-Tracing Lösungen an. Die vorliegende Dissertation behandelt BetrĂ€ge zu dieser Entwicklung in mehreren Kernaspekten. Der erste Teil beschĂ€ftigt sich mit der eingeschrĂ€nkten UnterstĂŒtzung von dynamischen Szenen in bisherigen Verfahren. Hierbei behandeln wir zwei zur Parallelisierung geeignete Algorithmen zur Erstellung von KD-BĂ€umen und Bounding-Volume-Hierarchien. Durch Approximation von Kosten-Funktionen kann eine Verbesserung der Konstruktionszeit von bis zu einer GrĂ¶ĂŸenordnung erreicht werden (speziell fĂŒr BVH-Strukturen), bei nur geringem Verlust von Traversierungs-Effizienz. Mit Blick auf den statischen Teil einer Szene beschĂ€ftigen wir uns mit der Erstellung “perfekter” Beschleunigungs-Strukturen. Wir entwickeln einen Algorithmus zur BVH-Erstellung, der ein globales Optimum in polynomialer Zeit liefert. Dies fĂŒhrt zu einer neuartigen Beschleunigungs-Struktur, welche sowohl die hohe Leistung von KD-BĂ€umen, als auch den geringen Platzbedarf von BVH-Strukturen in sich vereinigt. Abschließend betrachten wir Echtzeit-Ray-Tracing auf Desktop-Computern. Wir entwickeln neuartige KD-Baum- und BVH-Traversierungs-Algorithmen, die speziell auf den Einsatz von Grafikprozessoren zugeschnitten sind. Wir zeigen damit zum ersten Mal, dass GPU-Ray-Tracing nicht nur praktikabel ist, sondern auch mehr als eine GrĂ¶ĂŸenordnung effizienter sein kann als CPU basierte Ray-Tracing-Verfahren, selbst bei der Darstellung großer CAD Modelle

    Volumetric cloud generation using a Chinese brush calligraphy style

    Get PDF
    Includes bibliographical references.Clouds are an important feature of any real or simulated environment in which the sky is visible. Their amorphous, ever-changing and illuminated features make the sky vivid and beautiful. However, these features increase both the complexity of real time rendering and modelling. It is difficult to design and build volumetric clouds in an easy and intuitive way, particularly if the interface is intended for artists rather than programmers. We propose a novel modelling system motivated by an ancient painting style, Chinese Landscape Painting, to address this problem. With the use of only one brush and one colour, an artist can paint a vivid and detailed landscape efficiently. In this research, we develop three emulations of a Chinese brush: a skeleton-based brush, a 2D texture footprint and a dynamic 3D footprint, all driven by the motion and pressure of a stylus pen. We propose a hybrid mapping to generate both the body and surface of volumetric clouds from the brush footprints. Our interface integrates these components along with 3D canvas control and GPU-based volumetric rendering into an interactive cloud modelling system. Our cloud modelling system is able to create various types of clouds occurring in nature. User tests indicate that our brush calligraphy approach is preferred to conventional volumetric cloud modelling and that it produces convincing 3D cloud formations in an intuitive and interactive fashion. While traditional modelling systems focus on surface generation of 3D objects, our brush calligraphy technique constructs the interior structure. This forms the basis of a new modelling style for objects with amorphous shape

    Generating renderers

    Get PDF
    Most production renderers developed for the film industry are huge pieces of software that are able to render extremely complex scenes. Unfortunately, they are implemented using the currently available programming models that are not well suited to modern computing hardware like CPUs with vector units or GPUs. Thus, they have to deal with the added complexity of expressing parallelism and using hardware features in those models. Since compilers cannot alone optimize and generate efficient programs for any type of hardware, because of the large optimization spaces and the complexity of the underlying compiler problems, programmers have to rely on compiler-specific hardware intrinsics or write non-portable code. The consequence of these limitations is that programmers resort to writing the same code twice when they need to port their algorithm on a different architecture, and that the code itself becomes difficult to maintain, as algorithmic details are buried under hardware details. Thankfully, there are solutions to this problem, taking the form of Domain-Specific Lan- guages. As their name suggests, these languages are tailored for one domain, and compilers can therefore use domain-specific knowledge to optimize algorithms and choose the best execution policy for a given target hardware. In this thesis, we opt for another way of encoding domain- specific knowledge: We implement a generic, high-level, and declarative rendering and traversal library in a functional language, and later refine it for a target machine by providing partial evaluation annotations. The partial evaluator then specializes the entire renderer according to the available knowledge of the scene: Shaders are specialized when their inputs are known, and in general, all redundant computations are eliminated. Our results show that the generated renderers are faster and more portable than renderers written with state-of-the-art competing libraries, and that in comparison, our rendering library requires less implementation effort.Die meisten in der Filmindustrie zum Einsatz kommenden Renderer sind riesige Softwaresysteme, die in der Lage sind, extrem aufwendige Szenen zu rendern. Leider sind diese mit den aktuell verfĂŒgbaren Programmiermodellen implementiert, welche nicht gut geeignet sind fĂŒr moderne Rechenhardware wie CPUs mit Vektoreinheiten oder GPUs. Deshalb mĂŒssen Entwickler sich mit der zusĂ€tzlichen KomplexitĂ€t auseinandersetzen, Parallelismus und Hardwarefunktionen in diesen Programmiermodellen auszudrĂŒcken. Da Compiler nicht selbstĂ€ndig optimieren und effiziente Programme fĂŒr jeglichen Typ Hardware generieren können, wegen des großen Optimierungsraumes und der KomplexitĂ€t des unterliegenden Kompilierungsproblems, mĂŒssen Programmierer auf Compiler-spezifische Hardware-“Intrinsics” zurĂŒckgreifen, oder nicht portierbaren Code schreiben. Die Konsequenzen dieser Limitierungen sind, dass Programmierer darauf zurĂŒckgreifen den gleichen Code zweimal zu schreiben, wenn sie ihre Algorithmen fĂŒr eine andere Architektur portieren mĂŒssen, und dass der Code selbst schwer zu warten wird, da algorithmische Details unter Hardwaredetails verloren gehen. GlĂŒcklicherweise gibt es Lösungen fĂŒr dieses Problem, in der Form von DSLs. Diese Sprachen sind maßgeschneidert fĂŒr eine DomĂ€ne und Compiler können deshalb DomĂ€nenspezifisches Wissen nutzen, um Algorithmen zu optimieren und die beste AusfĂŒhrungsstrategie fĂŒr eine gegebene Zielhardware zu wĂ€hlen. In dieser Dissertation wĂ€hlen wir einen anderen Weg, DomĂ€nenspezifisches Wissen zu enkodieren: Wir implementieren eine generische, high-level und deklarative Rendering- und Traversierungsbibliothek in einer funktionalen Programmiersprache, und verfeinern sie spĂ€ter fĂŒr eine Zielmaschine durch Bereitstellung von Annotationen fĂŒr die partielle Auswertung. Der “Partial Evaluator” spezialisiert dann den kompletten Renderer, basierend auf dem verfĂŒgbaren Wissen ĂŒber die Szene: Shader werden spezialisiert, wenn ihre Eingaben bekannt sind, und generell werden alle redundanten Berechnungen eliminiert. Unsere Ergebnisse zeigen, dass die generierten Renderer schneller und portierbarer sind, als Renderer geschrieben mit den aktuellen Techniken konkurrierender Bibliotheken und dass, im Vergleich, unsere Rendering Bibliothek weniger Implementierungsaufwand erfordert.This work was supported by the Federal Ministry of Education and Research (BMBF) as part of the Metacca and ProThOS projects as well as by the Intel Visual Computing Institute (IVCI) and Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at Saarland University. Parts of it were also co-funded by the European Union(EU), as part of the Dreamspace project

    Ray-Traced Collision Detection : Interpenetration Control and Multi-GPU Performance

    Get PDF
    International audienceWe proposed [LGA13] an iterative ray-traced collision detection algorithm (IRTCD) that exploits spatial and temporal coherency and proved to be computationally efficient but at the price of some geometrical approximations that allow more interpenetration than needed. In this paper, we present two methods to efficiently control and reduce the interpenetration without noticeable computation overhead. The first method predicts the next potentially colliding vertices. These predictions are used to make our IRTCD algorithm more robust to the above-mentioned approximations, therefore reducing the errors up to 91%. We also present a ray re-projection algorithm that improves the physical response of ray-traced collision detection algorithm. This algorithm also reduces, up to 52%, the interpenetration between objects in a virtual environment. Our last contribution shows that our algorithm, when implemented on multi-GPUs architectures, is far faster

    View-dependent Exploration of Massive Volumetric Models on Large Scale Light Field Displays

    Get PDF
    We report on a light-field display based virtual environment enabling multiple naked-eye users to perceive detailed multi-gigavoxel volumetric models as floating in space, responsive to their actions, and delivering different information in different areas of the workspace. Our contributions include a set of specialized interactive illustrative techniques able to provide different contextual information in different areas of the display, as well as an out-of-core CUDA based raycasting engine with a number of improvements over current GPU volume raycasters. The possibilities of the system are demonstrated by the multi-user interactive exploration of 64GVoxels datasets on a 35MPixel light field display driven by a cluster of PCs.1037-1047Pubblicat

    A parallel algorithm for construction of uniform grids

    Full text link
    • 

    corecore