127 research outputs found

    Hardware acceleration of photon mapping

    Get PDF
    PhD ThesisThe quest for realism in computer-generated graphics has yielded a range of algorithmic techniques, the most advanced of which are capable of rendering images at close to photorealistic quality. Due to the realism available, it is now commonplace that computer graphics are used in the creation of movie sequences, architectural renderings, medical imagery and product visualisations. This work concentrates on the photon mapping algorithm [1, 2], a physically based global illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically accurate images. A drawback to photon mapping however is its rendering times, which can be significantly longer than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is associated with a high computational cost. This computation is usually performed using the general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm implemented as a software routine. Other options available for processing these algorithms include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware devices. GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation, however with their recent drive towards increased programmability they can also be used to process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often have to be reworked to make optimal use of the limited resources available. There are very few custom hardware devices available for acceleration of the photon mapping algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of producing the same physical accuracy and therefore realism, there are similarities between the algorithms. There have been several hardware prototypes, and at least one commercial offering, created with the goal of accelerating ray-trace rendering [3]. However, properties making many of these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping. There are even fewer proposals for acceleration of the additional functions found only in photon mapping. All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently difficult to scale, while many of the custom hardware devices available thus far make use of large processing elements and complex acceleration data structures. In this work we make use of three novel approaches in the design of highly scalable specialised hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is gained through: • The use of a brute-force approach in place of the commonly used smart approach, thus eliminating much data pre-processing, complex data structures and large processing units often required. • The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a reduction in processing area requirement. • A novel redesign of the photon inclusion test, used within the photon search method of the photon mapping algorithm. This allows an intelligent memory structure to be used for the search. The design uses two hardware structures, both of which accelerate one core rendering function. Renderings produced using field programmable gate array (FPGA) based prototypes are presented, along with details of 90nm synthesised versions of the designs which show that close to an orderof- magnitude speedup over a software implementation is possible. Due to the scalable nature of the design, it is likely that any advantage can be maintained in the face of improving processor speeds. Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used software acceleration method. This means that the device can interface almost directly to a frontend modelling package, minimising much of the pre-processing required by most other proposals

    Volume visualization of time-varying data using parallel, multiresolution and adaptive-resolution techniques

    Get PDF
    This paper presents a parallel rendering approach that allows high-quality visualization of large time-varying volume datasets. Multiresolution and adaptive-resolution techniques are also incorporated to improve the efficiency of the rendering. Three basic steps are needed to implement this kind of an application. First we divide the task through decomposition of data. This decomposition can be either temporal or spatial or a mix of both. After data has been divided, each of the data portions is rendered by a separate processor to create sub-images or frames. Finally these sub-images or frames are assembled together into a final image or animation. After developing this application, several experiments were performed to show that this approach indeed saves time when a reasonable number of processors are used. Also, we conclude that the optimal number of processors is dependent on the size of the dataset used

    Distributed Shared Memory for Roaming Large Volumes

    Get PDF
    We present a cluster-based volume rendering system for roaming very large volumes. This system allows to move a gigabyte-sized probe inside a total volume of several tens or hundreds of gigabytes in real-time. While the size of the probe is limited by the total amount of texture memory on the cluster, the size of the total data set has no theoretical limit. The cluster is used as a distributed graphics processing unit that both aggregates graphics power and graphics memory. A hardware-accelerated volume renderer runs in parallel on the cluster nodes and the final image compositing is implemented using a pipelined sort-last rendering algorithm. Meanwhile, volume bricking and volume paging allow efficient data caching. On each rendering node, a distributed hierarchical cache system implements a global software-based distributed shared memory on the cluster. In case of a cache miss, this system first checks page residency on the other cluster nodes instead of directly accessing local disks. Using two Gigabit Ethernet network interfaces per node, we accelerate data fetching by a factor of 4 compared to directly accessing local disks. The system also implements asynchronous disk access and texture loading, which makes it possible to overlap data loading, volume slicing and rendering for optimal volume roaming

    Hardware acceleration of photon mapping

    Get PDF
    The quest for realism in computer-generated graphics has yielded a range of algorithmic techniques, the most advanced of which are capable of rendering images at close to photorealistic quality. Due to the realism available, it is now commonplace that computer graphics are used in the creation of movie sequences, architectural renderings, medical imagery and product visualisations. This work concentrates on the photon mapping algorithm [1, 2], a physically based global illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically accurate images. A drawback to photon mapping however is its rendering times, which can be significantly longer than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is associated with a high computational cost. This computation is usually performed using the general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm implemented as a software routine. Other options available for processing these algorithms include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware devices. GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation, however with their recent drive towards increased programmability they can also be used to process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often have to be reworked to make optimal use of the limited resources available. There are very few custom hardware devices available for acceleration of the photon mapping algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of producing the same physical accuracy and therefore realism, there are similarities between the algorithms. There have been several hardware prototypes, and at least one commercial offering, created with the goal of accelerating ray-trace rendering [3]. However, properties making many of these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping. There are even fewer proposals for acceleration of the additional functions found only in photon mapping. All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently difficult to scale, while many of the custom hardware devices available thus far make use of large processing elements and complex acceleration data structures. In this work we make use of three novel approaches in the design of highly scalable specialised hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is gained through: • The use of a brute-force approach in place of the commonly used smart approach, thus eliminating much data pre-processing, complex data structures and large processing units often required. • The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a reduction in processing area requirement. • A novel redesign of the photon inclusion test, used within the photon search method of the photon mapping algorithm. This allows an intelligent memory structure to be used for the search. The design uses two hardware structures, both of which accelerate one core rendering function. Renderings produced using field programmable gate array (FPGA) based prototypes are presented, along with details of 90nm synthesised versions of the designs which show that close to an orderof- magnitude speedup over a software implementation is possible. Due to the scalable nature of the design, it is likely that any advantage can be maintained in the face of improving processor speeds. Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used software acceleration method. This means that the device can interface almost directly to a frontend modelling package, minimising much of the pre-processing required by most other proposals.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    COTS Cluster-based Sort-last Rendering: Performance Evaluation and Pipelined Implementation

    Get PDF
    Sort-last parallel rendering is an efficient technique to visualize huge datasets on COTS clusters. The dataset is subdivided and distributed across the cluster nodes. For every frame, each node renders a full resolution image of its data using its local GPU, and the images are composited together using a parallel image compositing algorithm. In this paper, we present a performance evaluation of standard sort-last parallel rendering methods and of the different improvements proposed in the literature. This evaluation is based on a detailed analysis of the different hardware and software components. We present a new implementation of sort-last rendering that fully overlaps CPU(s), GPU and network usage all along the algorithm. We present experiments on a 3 years old 32-node PC cluster and on a 1.5 years old 5-node PC cluster, both with Gigabit interconnect, showing volume rendering at respectively 13 and 31 frames per second and polygon rendering at respectively 8 and 17 frames per second on a 1024Ă—768 render area, and we show that our implementation outperforms or equals many other implementations and specialized visualization clusters

    Tiled shading

    Get PDF

    High performance computer simulated bronchoscopy with interactive navigation.

    Get PDF
    by Ping-Fu Fung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 98-102).Abstract also in Chinese.Abstract --- p.ivAcknowledgements --- p.viChapter 1 --- Introduction --- p.1Chapter 1.1 --- Medical Visualization System --- p.4Chapter 1.1.1 --- Data Acquisition --- p.4Chapter 1.1.2 --- Computer-aided Medical Visualization --- p.5Chapter 1.1.3 --- Existing Systems --- p.6Chapter 1.2 --- Research Goal --- p.8Chapter 1.2.1 --- System Architecture --- p.9Chapter 1.3 --- Organization of this Thesis --- p.10Chapter 2 --- Volume Visualization --- p.11Chapter 2.1 --- Sampling Grid and Volume Representation --- p.11Chapter 2.2 --- Priori Work in Volume Rendering --- p.13Chapter 2.2.1 --- Surface VS Direct --- p.14Chapter 2.2.2 --- Image-order VS Object-order --- p.18Chapter 2.2.3 --- Orthogonal VS Perspective --- p.22Chapter 2.2.4 --- Hardware Acceleration VS Software Acceleration --- p.23Chapter 2.3 --- Chapter Summary --- p.29Chapter 3 --- IsoRegion Leaping Technique for Perspective Volume Rendering --- p.30Chapter 3.1 --- Compositing Projection in Direct Volume Rendering --- p.31Chapter 3.2 --- IsoRegion Leaping Acceleration --- p.34Chapter 3.2.1 --- IsoRegion Definition --- p.35Chapter 3.2.2 --- IsoRegion Construction --- p.37Chapter 3.2.3 --- IsoRegion Step Table --- p.38Chapter 3.2.4 --- Ray Traversal Scheme --- p.41Chapter 3.3 --- Experiment Result --- p.43Chapter 3.4 --- Improvement --- p.47Chapter 3.5 --- Chapter Summary --- p.48Chapter 4 --- Parallel Volume Rendering by Distributed Processing --- p.50Chapter 4.1 --- Multi-platform Loosely-coupled Parallel Environment Shell --- p.51Chapter 4.2 --- Distributed Rendering Pipeline (DRP) --- p.55Chapter 4.2.1 --- Network Architecture of a Loosely-Coupled System --- p.55Chapter 4.2.2 --- Data and Task Partitioning --- p.58Chapter 4.2.3 --- Communication Pattern and Analysis --- p.59Chapter 4.3 --- Load Balancing --- p.69Chapter 4.4 --- Heterogeneous Rendering --- p.72Chapter 4.5 --- Chapter Summary --- p.73Chapter 5 --- User Interface --- p.74Chapter 5.1 --- System Design --- p.75Chapter 5.2 --- 3D Pen Input Device --- p.76Chapter 5.3 --- Visualization Environment Integration --- p.77Chapter 5.4 --- User Interaction: Interactive Navigation --- p.78Chapter 5.4.1 --- Camera Model --- p.79Chapter 5.4.2 --- Zooming --- p.81Chapter 5.4.3 --- Image View --- p.82Chapter 5.4.4 --- User Control --- p.83Chapter 5.5 --- Chapter Summary --- p.87Chapter 6 --- Conclusion --- p.88Chapter 6.1 --- Final Summary --- p.88Chapter 6.2 --- Deficiency and Improvement --- p.89Chapter 6.3 --- Future Research Aspect --- p.91Appendix --- p.93Chapter A --- Common Error in Pre-multiplying Color and Opacity --- p.94Chapter B --- Binary Factorization of the Sample Composition Equation --- p.9

    Doctor of Philosophy

    Get PDF
    dissertationBalancing the trade off between the spatial and temporal quality of interactive computer graphics imagery is one of the fundamental design challenges in the construction of rendering systems. Inexpensive interactive rendering hardware may deliver a high level of temporal performance if the level of spatial image quality is sufficiently constrained. In these cases, the spatial fidelity level is an independent parameter of the system and temporal performance is a dependent variable. The spatial quality parameter is selected for the system by the designer based on the anticipated graphics workload. Interactive ray tracing is one example; the algorithm is often selected due to its ability to deliver a high level of spatial fidelity, and the relatively lower level of temporal performance isreadily accepted. This dissertation proposes an algorithm to perform fine-grained adjustments to the trade off between the spatial quality of images produced by an interactive renderer, and the temporal performance or quality of the rendered image sequence. The approach first determines the minimum amount of sampling work necessary to achieve a certain fidelity level, and then allows the surplus capacity to be directed towards spatial or temporal fidelity improvement. The algorithm consists of an efficient parallel spatial and temporal adaptive rendering mechanism and a control optimization problem which adjusts the sampling rate based on a characterization of the rendered imagery and constraints on the capacity of the rendering system
    • …
    corecore