    Experiences with Mesh-like computations using Prediction Binary Trees

    In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a load-balancing technique in mesh-like computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use of a Predictor component that is in charge of providing an estimation of the unbalancing between successive phases. By using this information, our method partitions the computation in balanced tasks through the Prediction Binary Tree (PBT). At each new phase, current PBT is updated by using previous phase computing time for each task as next phase's cost estimate. The PBT is designed so that it balances the load across the tasks as well as reduces {\em dependency} among processors for higher performances. Reducing dependency is obtained by using rectangular tiles of the mesh, of almost-square shape (i. e. one dimension is at most twice the other). By reducing dependency, one can reduce inter-processors communication or exploit local dependencies among tasks (such as data locality). Furthermore, we also provide two heuristics which take advantage of data-locality. Our strategy has been assessed on a significant problem, Parallel Ray Tracing. Our implementation shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks. We report different measurements showing that tasks granularity is a key point for the performances of our decomposition/mapping strategy

    Efficient distributed load balancing for parallel algorithms

    2009 - 2010With the advent of massive parallel processing technology, exploiting the power offered by hundreds, or even thousands of processors is all but a trivial task. Computing by using multi-processor, multi-core or many-core adds a number of additional challenges related to the cooperation and communication of multiple processing units. The uneven distribution of data among the various processors, i.e. the load imbalance, represents one of the major problems in data parallel applications. Without good load distribution strategies, we cannot reach good speedup, thus good efficiency. Load balancing strategies can be classified in several ways, according to the methods used to balance workload. For instance, dynamic load balancing algorithms make scheduling decisions during the execution and commonly results in better performance compared to static approaches, where task assignment is done before the execution. Even more important is the difference between centralized and distributed load balancing approaches. In fact, despite that centralized algorithms have a wider vision of the computation, hence may exploit smarter balancing techniques, they expose global synchronization and communication bottlenecks involving the master node. This definitely does not assure scalability with the number of processors. This dissertation studies the impact of different load balancing strategies. In particular, one of the key observations driving our work is that distributed algorithms work better than centralized ones in the context of load balancing for multi-processors (alike for multi-cores and many-cores as well). We first show a centralized approach for load balancing, then we propose several distributed approaches for problems having different parallelization, workload distribution and communication pattern. We try to efficiently combine several approaches to improve performance, in particular using predictive metrics to obtain a per task compute-time estimation, using adaptive subdivision, improving dynamic load balancing and addressing distributed balancing schemas. The main challenge tackled on this thesis has been to combine all these approaches together in new and efficient load balancing schemas. We assess the proposed balancing techniques, starting from centralized approaches to distributed ones, in distinctive real case scenarios: Mesh-like computation, Parallel Ray Tracing, and Agent-based Simulations. Moreover, we test our algorithms with parallel hardware such has cluster of workstations, multi-core processors and exploiting SIMD vectorial instruction set. Finally, we conclude the thesis with several remarks, about the impact of distributed techniques, the effect of the communication pattern and workload distribution, the use of cost estimation for adaptive partitioning, the trade-off fast versus accuracy in prediction-based approaches, the effectiveness of work stealing combined with sorting, and a non-trivial way to exploit hybrid CPUGPU computations. [edited by author]IX n.s

    Realtime ray tracing and interactive global illumination

    One of the most sought-for goals in computer graphics is to generate "realism in real time". i.e. the generation of realistically looking images at realtime frame rates. Today, virtually all approaches towards realtime rendering use graphics hardware, which is based almost exclusively on triangle rasterization. Unfortunately, though this technology has seen tremendous progress over the last few years, for many applications it is currently reaching its limits in both model complexity, supported features, and achievable realism. An alternative to triangle rasterizations is the ray tracing algorithm, which is well-known for its higher flexibility, its generally higher achievable realism, and its superior scalability in both model size and compute power. However, ray tracing is also computationally demanding and thus so far is used almost exclusively for high-quality offline rendering tasks. This dissertation focuses on the question why ray tracing is likely to soon play a larger role for interactive applications, and how this scenario can be reached. To this end, we discuss the RTRT/OpenRT realtime ray tracing system, a software based ray tracing system that achieves interactive to realtime frame rates on todays commodity CPUs. In particular, we discuss the overall system design, the efficient implementation of the core ray tracing algorithms, techniques for handling dynamic scenes, an efficient parallelization framework, and an OpenGL-like low-level API. Taken together, these techniques form a complete realtime rendering engine that supports massively complex scenes, highley realistic and physically correct shading, and even physically based lighting simulation at interactive rates. In the last part of this thesis we then discuss the implications and potential of realtime ray tracing on global illumination, and how the availability of this new technology can be leveraged to finally achieve interactive global illumination - the physically correct simulation of light transport at interactive rates.Eines der wichtigsten Ziele der Computer-Graphik ist die Generierung von "Realismus in Echtzeit\u27; — die Erzeugung von realistisch wirkenden, computer- generierten Bildern in Echtzeit. Heutige Echtzeit-Graphikanwendungen werden derzeit zum ĂŒberwiegenden Teil mit schneller Graphik-Hardware realisiert, welche zum aktuellen Stand der Technik fast ausschliesslich auf dem Dreiecksrasterisierungsalgorithmus basiert. Obwohl diese Rasterisierungstechnologie in den letzten Jahren zunehmend beeindruckende Fortschritte gemacht hat, stĂ¶ĂŸt sie heutzutage zusehends an ihre Grenzen, speziell im Hinblick auf ModellkomplexitĂ€t, unterstĂŒtzte Beleuchtungseffekte, und erreichbaren Realismus. Eine Alternative zur Dreiecksrasterisierung ist das "Ray-Tracing\u27; (Stahl-RĂŒckverfolgung), welches weithin bekannt ist fĂŒr seine höhere FlexibilitĂ€t, seinen im Großen und Ganzen höheren erreichbaren Realismus, und seine bessere Skalierbarkeit sowohl in SzenengrĂ¶ĂŸe als auch in Rechner-KapazitĂ€ten. Allerdings ist Ray-Tracing ebenso bekannt fĂŒr seinen hohen Rechenbedarf, und wird daher heutzutage fast ausschließlich fĂŒr die hochqualitative, nichtinteraktive Bildsynthese benutzt. Diese Dissertation behandelt die GrĂŒnde warum Ray-Tracing in nĂ€herer Zukunft voraussichtlich eine grĂ¶ĂŸere Rolle fĂŒr interaktive Graphikanwendungen spielen wird, und untersucht, wie dieses Szenario des Echtzeit Ray-Tracing erreicht werden kann. HierfĂŒr stellen wir das RTRT/OpenRT Echtzeit Ray-Tracing System vor, ein software-basiertes Ray-Tracing System, welches es erlaubt, interaktive Performanz auf heutigen Standard-PC-Prozessoren zu erreichen. Speziell diskutieren wir das grundlegende System-Design, die effiziente Implementierung der Kern-Algorithmen, Techniken zur UnterstĂŒtzung von dynamischen Szenen, ein effizientes Parallelisierungs-Framework, und eine OpenGL-Ă€hnliche Anwendungsschnittstelle. In ihrer Gesamtheit formen diese Techniken ein komplettes Echtzeit-Rendering-System, welches es erlaubt, extrem komplexe Szenen, hochgradig realistische und physikalisch korrekte Effekte, und sogar physikalisch-basierte Beleuchtungssimulation interaktiv zu berechnen. Im letzten Teil der Dissertation behandeln wir dann die Implikationen und das Potential, welches Echtzeit Ray-Tracing fĂŒr die Globale Beleuchtungssimulation bietet, und wie die VerfĂŒgbarkeit dieser neuen Technologie benutzt werden kann, um letztendlich auch Globale Belechtung — die physikalisch korrekte Simulation des Lichttransports — interaktiv zu berechnen

    Parallel Rendering and Large Data Visualization

    We are living in the big data age: An ever increasing amount of data is being produced through data acquisition and computer simulations. While large scale analysis and simulations have received significant attention for cloud and high-performance computing, software to efficiently visualise large data sets is struggling to keep up. Visualization has proven to be an efficient tool for understanding data, in particular visual analysis is a powerful tool to gain intuitive insight into the spatial structure and relations of 3D data sets. Large-scale visualization setups are becoming ever more affordable, and high-resolution tiled display walls are in reach even for small institutions. Virtual reality has arrived in the consumer space, making it accessible to a large audience. This thesis addresses these developments by advancing the field of parallel rendering. We formalise the design of system software for large data visualization through parallel rendering, provide a reference implementation of a parallel rendering framework, introduce novel algorithms to accelerate the rendering of large amounts of data, and validate this research and development with new applications for large data visualization. Applications built using our framework enable domain scientists and large data engineers to better extract meaning from their data, making it feasible to explore more data and enabling the use of high-fidelity visualization installations to see more detail of the data.Comment: PhD thesi

    Hypergraph-partitioning-based remapping models for image-space-parallel direct volume rendering of unstructured grids

    In this work, image-space-parallel direct volume rendering (DVR) of unstructured grids is investigated for distributed-memory architectures. A hypergraph-partitioning-based model is proposed for the adaptive screen partitioning problem in this context. The proposed model aims to balance the rendering loads of processors while trying to minimize the amount of data replication. In the parallel DVR framework we adopted, each data primitive is statically owned by its home processor, which is responsible from replicating its primitives on other processors. Two appropriate remapping models are proposed by enhancing the above model for use within this framework. These two remapping models aim to minimize the total volume of communication in data replication while balancing the rendering loads of processors. Based on the proposed models, a parallel DVR algorithm is developed. The experiments conducted on a PC cluster show that the proposed remapping models achieve better speedup values compared to the remapping models previously suggested for image-space-parallel DVR. © 2007 IEEE

    Adaptive remote visualization system with optimized network performance for large scale scientific data

    This dissertation discusses algorithmic and implementation aspects of an automatically configurable remote visualization system, which optimally decomposes and adaptively maps the visualization pipeline to a wide-area network. The first node typically serves as a data server that generates or stores raw data sets and a remote client resides on the last node equipped with a display device ranging from a personal desktop to a powerwall. Intermediate nodes can be located anywhere on the network and often include workstations, clusters, or custom rendering engines. We employ a regression model-based network daemon to estimate the effective bandwidth and minimal delay of a transport path using active traffic measurement. Data processing time is predicted for various visualization algorithms using block partition and statistical technique. Based on the link measurements, node characteristics, and module properties, we strategically organize visualization pipeline modules such as filtering, geometry generation, rendering, and display into groups, and dynamically assign them to appropriate network nodes to achieve minimal total delay for post-processing or maximal frame rate for streaming applications. We propose polynomial-time algorithms using the dynamic programming method to compute the optimal solutions for the problems of pipeline decomposition and network mapping under different constraints. A parallel based remote visualization system, which comprises a logical group of autonomous nodes that cooperate to enable sharing, selection, and aggregation of various types of resources distributed over a network, is implemented and deployed at geographically distributed nodes for experimental testing. Our system is capable of handling a complete spectrum of remote visualization tasks expertly including post processing, computational steering and wireless sensor network monitoring. Visualization functionalities such as isosurface, ray casting, streamline, linear integral convolution (LIC) are supported in our system. The proposed decomposition and mapping scheme is generic and can be applied to other network-oriented computation applications whose computing components form a linear arrangement

    Ray tracing techniques for computer games and isosurface visualization

    Ray tracing is a powerful image synthesis technique, that has been used for high-quality offline rendering since decades. In recent years, this technique has become more important for realtime applications, but still plays only a minor role in many areas. Some of the reasons are that ray tracing is compute intensive and has to rely on preprocessed data structures to achieve fast performance. This dissertation investigates methods to broaden the applicability of ray tracing and is divided into two parts. The first part explores the opportunities offered by ray tracing based game technology in the context of current and expected future performance levels. In this regard, novel methods are developed to efficiently support certain kinds of dynamic scenes, while avoiding the burden to fully recompute the required data structures. Furthermore, todays ray tracing performance levels are below what is needed for 3D games. Therefore, the multi-core CPU of the Playstation 3 is investigated, and an optimized ray tracing architecture presented to take steps towards the required performance. In part two, the focus shifts to isosurface raytracing. Isosurfaces are particularly important to understand the distribution of certain values in volumetric data. Since the structure of volumetric data sets is diverse, op- timized algorithms and data structures are developed for rectilinear as well as unstructured data sets which allow for realtime rendering of isosurfaces including advanced shading and visualization effects. This also includes tech- niques for out-of-core and time-varying data sets.Ray-tracing ist ein flexibles Bildgebungsverfahren, das schon seit Jahrzehnten fĂŒr hoch qualitative, aber langsame Bilderzeugung genutzt wird. In den letzten Jahren wurde Ray-tracing auch fĂŒr Echtzeitanwendungen immer interessanter, spielt aber in vielen Anwendungsbereichen noch immer eine untergeordnete Rolle. Einige der GrĂŒnde sind die RechenintensitĂ€t von Ray-tracing sowie die AbhĂ€ngigkeit von vorberechneten Datenstrukturen um hohe Geschwindigkeiten zu erreichen. Diese Dissertation untersucht Methoden um die Anwendbarkeit von Ray-tracing in zwei verschiedenen Bereichen zu erhöhen. Im ersten Teil dieser Dissertation werden die Möglichkeiten, die Ray- tracing basierte Spieletechnologie bietet, im Kontext mit aktueller sowie zukĂŒnftig erwarteten Geschwindigkeiten untersucht. DarĂŒber hinaus werden in diesem Zusammenhang Methoden entwickelt um bestimmte zeitverĂ€nderliche Szenen darstellen zu können ohne die dafĂŒr benötigen Datenstrukturen von Grund auf neu erstellen zu mĂŒssen. Da die Geschwindigkeit von Ray-tracing fĂŒr Spiele bisher nicht ausreichend ist, wird die Mehrkern- CPU der Playstation 3 untersucht, und ein optimiertes Ray-tracing System beschrieben, das Ray-tracing nĂ€her an die benötigte Geschwindigkeit heranbringt. Der zweite Teil beschĂ€ftigt sich mit der Darstellung von IsoflĂ€chen mittels Ray-tracing. IsoflĂ€chen sind insbesonders wichtig um die Verteilung einzelner Werte in volumetrischen DatensĂ€tzen zu verstehen. Da diese DatensĂ€tze verschieden strukturiert sein können, werden fĂŒr gitterförmige und unstrukturierte DatensĂ€tze optimierte Algorithmen und Datenstrukturen entwickelt, die die Echtzeitdarstellung von IsoflĂ€chen erlauben. Dies beinhaltet auch Erweiterungen fĂŒr extrem große und zeitverĂ€nderliche DatensĂ€tze

    Architectures for ubiquitous 3D on heterogeneous computing platforms

    Today, a wide scope for 3D graphics applications exists, including domains such as scientific visualization, 3D-enabled web pages, and entertainment. At the same time, the devices and platforms that run and display the applications are more heterogeneous than ever. Display environments range from mobile devices to desktop systems and ultimately to distributed displays that facilitate collaborative interaction. While the capability of the client devices may vary considerably, the visualization experiences running on them should be consistent. The field of application should dictate how and on what devices users access the application, not the technical requirements to realize the 3D output. The goal of this thesis is to examine the diverse challenges involved in providing consistent and scalable visualization experiences to heterogeneous computing platforms and display setups. While we could not address the myriad of possible use cases, we developed a comprehensive set of rendering architectures in the major domains of scientific and medical visualization, web-based 3D applications, and movie virtual production. To provide the required service quality, performance, and scalability for different client devices and displays, our architectures focus on the efficient utilization and combination of the available client, server, and network resources. We present innovative solutions that incorporate methods for hybrid and distributed rendering as well as means to manage data sets and stream rendering results. We establish the browser as a promising platform for accessible and portable visualization services. We collaborated with experts from the medical field and the movie industry to evaluate the usability of our technology in real-world scenarios. The presented architectures achieve a wide coverage of display and rendering setups and at the same time share major components and concepts. Thus, they build a strong foundation for a unified system that supports a variety of use cases.Heutzutage existiert ein großer Anwendungsbereich fĂŒr 3D-Grafikapplikationen wie wissenschaftliche Visualisierungen, 3D-Inhalte in Webseiten, und Unterhaltungssoftware. Gleichzeitig sind die GerĂ€te und Plattformen, welche die Anwendungen ausfĂŒhren und anzeigen, heterogener als je zuvor. AnzeigegerĂ€te reichen von mobilen GerĂ€ten zu Desktop-Systemen bis hin zu verteilten Bildschirmumgebungen, die eine kollaborative Anwendung begĂŒnstigen. WĂ€hrend die LeistungsfĂ€higkeit der GerĂ€te stark schwanken kann, sollten die dort laufenden Visualisierungen konsistent sein. Das Anwendungsfeld sollte bestimmen, wie und auf welchem GerĂ€t Benutzer auf die Anwendung zugreifen, nicht die technischen Voraussetzungen zur Erzeugung der 3D-Grafik. Das Ziel dieser Thesis ist es, die diversen Herausforderungen zu untersuchen, die bei der Bereitstellung von konsistenten und skalierbaren Visualisierungsanwendungen auf heterogenen Plattformen eine Rolle spielen. WĂ€hrend wir nicht die Vielzahl an möglichen AnwendungsfĂ€llen abdecken konnten, haben wir eine reprĂ€sentative Auswahl an Rendering-Architekturen in den Kernbereichen wissenschaftliche Visualisierung, web-basierte 3D-Anwendungen, und virtuelle Filmproduktion entwickelt. Um die geforderte QualitĂ€t, Leistung, und Skalierbarkeit fĂŒr verschiedene Client-GerĂ€te und -Anzeigen zu gewĂ€hrleisten, fokussieren sich unsere Architekturen auf die effiziente Nutzung und Kombination der verfĂŒgbaren Client-, Server-, und Netzwerkressourcen. Wir prĂ€sentieren innovative Lösungen, die hybrides und verteiltes Rendering als auch das Verwalten der DatensĂ€tze und Streaming der 3D-Ausgabe umfassen. Wir etablieren den Web-Browser als vielversprechende Plattform fĂŒr zugĂ€ngliche und portierbare Visualisierungsdienste. Um die Verwendbarkeit unserer Technologie in realitĂ€tsnahen Szenarien zu testen, haben wir mit Experten aus der Medizin und Filmindustrie zusammengearbeitet. Unsere Architekturen erreichen eine umfassende Abdeckung von Anzeige- und Rendering-Szenarien und teilen sich gleichzeitig wesentliche Komponenten und Konzepte. Sie bilden daher eine starke Grundlage fĂŒr ein einheitliches System, das eine Vielzahl an AnwendungsfĂ€llen unterstĂŒtzt

    A hypergraph-partitioning based remapping model for image-space parallel volume rendering

    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent Univ., 2000.Thesis (Master's) -- Bilkent University, 2000.Includes bibliographical references leaves 72-76.Cambazoğlu, Berkant BarlaM.S

    Lattice-Boltzmann simulations of cerebral blood flow

    Computational haemodynamics play a central role in the understanding of blood behaviour in the cerebral vasculature, increasing our knowledge in the onset of vascular diseases and their progression, improving diagnosis and ultimately providing better patient prognosis. Computer simulations hold the potential of accurately characterising motion of blood and its interaction with the vessel wall, providing the capability to assess surgical treatments with no danger to the patient. These aspects considerably contribute to better understand of blood circulation processes as well as to augment pre-treatment planning. Existing software environments for treatment planning consist of several stages, each requiring significant user interaction and processing time, significantly limiting their use in clinical scenarios. The aim of this PhD is to provide clinicians and researchers with a tool to aid in the understanding of human cerebral haemodynamics. This tool employs a high performance fluid solver based on the lattice-Boltzmann method (coined HemeLB), high performance distributed computing and grid computing, and various advanced software applications useful to efficiently set up and run patient-specific simulations. A graphical tool is used to segment the vasculature from patient-specific CT or MR data and configure boundary conditions with ease, creating models of the vasculature in real time. Blood flow visualisation is done in real time using in situ rendering techniques implemented within the parallel fluid solver and aided by steering capabilities; these programming strategies allows the clinician to interactively display the simulation results on a local workstation. A separate software application is used to numerically compare simulation results carried out at different spatial resolutions, providing a strategy to approach numerical validation. This developed software and supporting computational infrastructure was used to study various patient-specific intracranial aneurysms with the collaborating interventionalists at the National Hospital for Neurology and Neuroscience (London), using three-dimensional rotational angiography data to define the patient-specific vasculature. Blood flow motion was depicted in detail by the visualisation capabilities, clearly showing vortex fluid ow features and stress distribution at the inner surface of the aneurysms and their surrounding vasculature. These investigations permitted the clinicians to rapidly assess the risk associated with the growth and rupture of each aneurysm. The ultimate goal of this work is to aid clinical practice with an efficient easy-to-use toolkit for real-time decision support
