149 research outputs found

    Hierarchical N-Body problem on graphics processor unit

    Get PDF
    Galactic simulation is an important cosmological computation, and represents a classical N-body problem suitable for implementation on vector processors. Barnes-Hut algorithm is a hierarchical N-Body method used to simulate such galactic evolution systems. Stream processing architectures expose data locality and concurrency available in multimedia applications. On the other hand, there are numerous compute-intensive scientific or engineering applications that can potentially benefit from such computational and communication models. These applications are traditionally implemented on vector processors. Stream architecture based graphics processor units (GPUs) present a novel computational alternative for efficiently implementing such high-performance applications. Rendering on a stream architecture sustains high performance, while user-programmable modules allow implementing complex algorithms efficiently. GPUs have evolved over the years, from being fixed-function pipelines to user programmable processors. In this thesis, we focus on the implementation of Barnes-Hut algorithm on typical current-generation programmable GPUs. We exploit computation and communication requirements present in Barnes-Hut algorithm to expose their suitability for user-programmable GPUs. Our implementation of the Barnes-Hut algorithm is formulated as a fragment shader targeting the selected GPU. We discuss implementation details, design issues, results, and challenges encountered in programming the fragment shader

    Parallel Hierarchical Radiosity on Hybrid Platforms

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-011-0592-6[Abstract] Achieving an efficient realistic illumination is an important aim of research in computer graphics. In this paper a new parallel global illumination method for hybrid systems based on the hierarchical radiosity method is presented. Our solution allows the exploitation of systems that combine independent nodes with multiple cores per node. Thus, multiple nodes work in parallel in the computation of the global illumination for the same scene. Within each node, all the available computational cores are used through a shared-memory multithreading approach. The good results obtained in terms of speedup on several distributed-memory and shared-memory configurations show the versatility of our hybrid proposal.[Resumo] Acadar unha eficiente iluminación realista é un importante obxectivo no campo dos gráficos por computadora. Neste traballo preséntase un novo método de iluminación global paralelo para sistemas híbridos baseado no modelo de radiosidade jerárquica. A nosa solución permite a explotación de sistemas que combinen nodos de cómputo independentes con múltiples núcleos de execución en cada nodo. Deste xeito, varios nodos traballan en paralelo na computación da iluminación global dunha mesma escea. Dentro de cada nodo, todos os núcleos computacionais dispoñibles son aproveitados mediante unha aproximación multifío en memoria compartida. Os bos resultados obtidos en canto a aceleración en distintas configuracións de memoria compartida e distribuída dan mostra da versatilidade da nosa proposta híbrida.Xunta de Galicia; INCITE08PXIB105161PRMinisterio de Educación y Ciencia; MEC TIN 2010-16735Xunta de Galicia; 08TIC001206P

    Efficient Object-Based Hierarchical Radiosity Methods

    Get PDF
    The efficient generation of photorealistic images is one of the main subjects in the field of computer graphics. In contrast to simple image generation which is directly supported by standard 3D graphics hardware, photorealistic image synthesis strongly adheres to the physics describing the flow of light in a given environment. By simulating the energy flow in a 3D scene global effects like shadows and inter-reflections can be rendered accurately. The hierarchical radiosity method is one way of computing the global illumination in a scene. Due to its limitation to purely diffuse surfaces solutions computed by this method are view independent and can be examined in real-time walkthroughs. Additionally, the physically based algorithm makes it well suited for lighting design and architectural visualization. The focus of this thesis is the application of object-oriented methods to the radiosity problem. By consequently keeping and using object information throughout all stages of the algorithms several contributions to the field of radiosity rendering could be made. By introducing a new meshing scheme, it is shown how curved objects can be treated efficiently by hierarchical radiosity algorithms. Using the same paradigm the radiosity computation can be distributed in a network of computers. A parallel implementation is presented that minimizes communication costs while obtaining an efficient speedup. Radiosity solutions for very large scenes became possible by the use of clustering algorithms. Groups of objects are combined to clusters to simulate the energy exchange on a higher abstraction level. It is shown how the clustering technique can be improved without loss in image quality by applying the same data-structure for both, the visibility computations and the efficient radiosity simulation.Eines der Schwerpunktthemen in der Computergraphik ist die effiziente Erzeugung von fotorealistischen Bildern. Im Gegensatz zur einfachen Bilderzeugung, die bereits durch gaengige 3D-Grafikhardware unterstuetzt wird, gehorcht die fotorealistische Bildsynthese physikalischen Gesetzen, die die Lichtausbreitung innerhalb einer bestimmten Umgebung beschreiben. Durch die Simulation der Energieausbreitung in einer dreidimensionalen Szene koennen globale Effekte wie Schatten und mehrfache Reflektionen wirklichkeitstreu dargestellt werden. Die hierarchische Radiositymethode (Hierarchical Radiosity) ist eine Moeglichkeit, um die globale Beleuchtung innerhalb einer Szene zu berechnen. Da diese Methode auf die Verwendung von rein diffus reflektierenden Oberflaechen beschraenkt ist, sind damit errechnete Loesungen blickwinkelunabhaengig und lassen sich in Echtzeit am Bildschirm durchwandern. Zudem ist dieser Algorithmus aufgrund der verwendeten physikalischen Grundlagen sehr gut zur Beleuchtungssimulation und Architekturvisualisierung geeignet. Den Schwerpunkt dieser Doktorarbeit stellt die Anwendung objektbasierter Methoden auf das Radiosityproblem dar. Durch konsequente Ausnutzung von Objektinformationen waehrend aller Berechnungsschritte konnten verschiedene Verbesserungen im Rahmen der hierarchischen Radiositymethode erzielt werden. Gekruemmte Objekte koennen aufgrund eines neuen Flaechenunterteilungsverfahrens nun effizient durch den hierarchischen Radiosityalgorithmus dargestellt werden. Dieses Verfahren ermoeglicht ebenso eine effiziente Parallelisierung des hierarchischen Radiosityalgorithmus. Es wird ein parallele Implementierung vorgestellt, die unter Minimierung der Kommunikationskosten eine effiziente Geschwindigkeitssteigerung erzielt. Radiosityberechnungen fuer sehr grosse Szenen sind nur durch Verwendung sogenannter Clustering-Algorithmen moeglich. Dabei werden Gruppen von Objekten zu Clustern kombiniert um den Energieaustausch zwischen Oberflaechen stellvertretend auf einem hoeheren Abstraktionsniveau durchzufuehren. Durch Verwendung derselben Datenstruktur fuer Sichtbarkeitsberechnungen und fuer die Steuerung der Radiositysimulation wird gezeigt, wie das Clusteringverfahren ohne Qualitaetsverluste verbessert werden kann

    A framework for realistic real-time walkthroughs in a VR distributed environment

    Get PDF
    Virtual and augmented reality (VR/AR) are increasingly being used in various business scenarios and are important driving forces in technology development. However the usage of these technologies in the home environment is restricted due to several factors including lack of low-cost (from the client point of view) highperformance solutions. In this paper we present a general client/server rendering architecture based on Real-Time concepts, including support for a wide range of client platforms and applications. The idea of focusing on the real-time behaviour of all components involved in distributed IP-based VR scenarios is new and has not been addressed before, except for simple sub-solutions. This is considered as “the most significant problem with the IP environment” [1]. Thus, the most important contribution of this research will be the holistic approach, in which networking, end-systems and rendering aspects are integrated into a cost-effective infrastructure for building distributed real-time VR applications on IP-based networks

    PHR: A parallel hierarchical radiosity system with dynamic load balancing

    Get PDF
    In this paper, we present a parallel system called PHR for computing hierarchical radiosity solutions of complex scenes. The system is targeted for multi-processor architectures with distributed memory. The system evaluates and subdivides the interactions level by level in a breadth first fashion, and the interactions are redistributed at the end of each level to keep load balanced. In order to allow interactions freely travel across processors, all the patch data is replicated on all the processors. Hence, the system favors load balancing at the expense of increased communication volume. However, the results show that the overhead of communication is negligible compared with total execution time. We obtained a speed-up of 25 for 32 processors in our test scenes. © 2005 Springer Science + Business Media, Inc

    Parallel Wavelet Radiosity

    Get PDF
    Colloque avec actes et comité de lecture.This paper presents parallel versions of a wavelet radiosity algorithm. Wavelet radiosity is based on a general framework of projection methods and wavelet theory. The resulting algorithm has a cost proportional to O(n) versus the O(n^2) complexity of the classical radiosity algorithms. However, designing a parallel wavelet radiosity is challenging because of its irregular and dynamic nature. Since explicit message passing approaches fail to deal with such applications, we have experimented various parallel implementations on a hardware ccNUMA architecture, the SGI Origin2000. Our experiments show that load balancing is a crucial performance issue to handle the dynamic distribution of work and communication, while we do make all reasonable efforts to exploit data locality efficiently. Our best results yield a speed-up of 24 with 36 processors, even when dealing with extremely complex models

    Pipelining the Fast Multipole Method over a Runtime System

    Get PDF
    Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012

    Fast and Accurate Wavelet Radiosity Computations Using High-End Platforms

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceIn this paper, we show how to fully exploit the capabilities of high--end SGI graphics and parallel machines to perform radiosity computations on scenes made of complex shapes both quickly and accurately. Overlapping multi--processing and multi--pipeline graphics accelerations on one hand, and incorporating recent research works on wavelet radiosity on the other hand, allows radiosity to become a practical tool for interactive design

    High-fidelity rendering on shared computational resources

    Get PDF
    The generation of high-fidelity imagery is a computationally expensive process and parallel computing has been traditionally employed to alleviate this cost. However, traditional parallel rendering has been restricted to expensive shared memory or dedicated distributed processors. In contrast, parallel computing on shared resources such as a computational or a desktop grid, offers a low cost alternative. But, the prevalent rendering systems are currently incapable of seamlessly handling such shared resources as they suffer from high latencies, restricted bandwidth and volatility. A conventional approach of rescheduling failed jobs in a volatile environment inhibits performance by using redundant computations. Instead, clever task subdivision along with image reconstruction techniques provides an unrestrictive fault-tolerance mechanism, which is highly suitable for high-fidelity rendering. This thesis presents novel fault-tolerant parallel rendering algorithms for effectively tapping the enormous inexpensive computational power provided by shared resources. A first of its kind system for fully dynamic high-fidelity interactive rendering on idle resources is presented which is key for providing an immediate feedback to the changes made by a user. The system achieves interactivity by monitoring and adapting computations according to run-time variations in the computational power and employs a spatio-temporal image reconstruction technique for enhancing the visual fidelity. Furthermore, algorithms described for time-constrained offline rendering of still images and animation sequences, make it possible to deliver the results in a user-defined limit. These novel methods enable the employment of variable resources in deadline-driven environments
    corecore