29 research outputs found

    Memory architecture for efficient utilization of SDRAM: a case study of the computation/memory access trade-off

    Get PDF
    This paper discusses the trade-off between calculations and memory accesses in a 3D graphics tile renderer for visualization of data from medical scanners. The performance requirement of this application is a frame rate of 25 frames per second when rendering 3D models with 2 million triangles, i.e. 50 million triangles per second, sustained (not peak). At present, a software implementation is capable of 3-4 frames per second for a 1 million triangle model

    Fast Volume Rendering and Deformation Algorithms

    Full text link
    Volume rendering is a technique for simultaneous visualization of surfaces and inner structures of objects. However, the huge number of volume primitives (voxels) in a volume, leads to high computational cost. In this dissertation I developed two algorithms for the acceleration of volume rendering and volume deformation. The first algorithm accelerates the ray casting of volume. Previous ray casting acceleration techniques like space-leaping and early-ray-termination are only efficient when most voxels in a volume are either opaque or transparent. When many voxels are semi-transparent, the rendering time will increase considerably. Our new algorithm improves the performance of ray casting of semi-transparently mapped volumes by exploiting the opacity coherency in object space, leading to a speedup factor between 1.90 and 3.49 in rendering semi-transparent volumes. The acceleration is realized with the help of pre-computed coherency distances. We developed an efficient algorithm to encode the coherency information, which requires less than 12 seconds for data sets with about 8 million voxels. The second algorithm is for volume deformation. Unlike the traditional methods, our method incorporates the two stages of volume deformation, i.e. deformation and rendering, into a unified process. Instead to deform each voxel to generate an intermediate deformed volume, the algorithm follows inversely deformed rays to generate the desired deformation. The calculations and memory for generating the intermediate volume are thus saved. The deformation continuity is achieved by adaptive ray division which matches the amplitude of local deformation. We proposed approaches for shading and opacit adjustment which guarantee the visual plausibility of deformation results. We achieve an additional deformation speedup factor of 2.34~6.58 by incorporating early-ray-termination, space-leaping and the coherency acceleration technique in the new deformation algorithm

    Multiresolution Techniques for Real–Time Visualization of Urban Environments and Terrains

    Get PDF
    In recent times we are witnessing a steep increase in the availability of data coming from real–life environments. Nowadays, virtually everyone connected to the Internet may have instant access to a tremendous amount of data coming from satellite elevation maps, airborne time-of-flight scanners and digital cameras, street–level photographs and even cadastral maps. As for other, more traditional types of media such as pictures and videos, users of digital exploration softwares expect commodity hardware to exhibit good performance for interactive purposes, regardless of the dataset size. In this thesis we propose novel solutions to the problem of rendering large terrain and urban models on commodity platforms, both for local and remote exploration. Our solutions build on the concept of multiresolution representation, where alternative representations of the same data with different accuracy are used to selectively distribute the computational power, and consequently the visual accuracy, where it is more needed on the base of the user’s point of view. In particular, we will introduce an efficient multiresolution data compression technique for planar and spherical surfaces applied to terrain datasets which is able to handle huge amount of information at a planetary scale. We will also describe a novel data structure for compact storage and rendering of urban entities such as buildings to allow real–time exploration of cityscapes from a remote online repository. Moreover, we will show how recent technologies can be exploited to transparently integrate virtual exploration and general computer graphics techniques with web applications

    Hardware acceleration of photon mapping

    Get PDF
    PhD ThesisThe quest for realism in computer-generated graphics has yielded a range of algorithmic techniques, the most advanced of which are capable of rendering images at close to photorealistic quality. Due to the realism available, it is now commonplace that computer graphics are used in the creation of movie sequences, architectural renderings, medical imagery and product visualisations. This work concentrates on the photon mapping algorithm [1, 2], a physically based global illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically accurate images. A drawback to photon mapping however is its rendering times, which can be significantly longer than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is associated with a high computational cost. This computation is usually performed using the general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm implemented as a software routine. Other options available for processing these algorithms include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware devices. GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation, however with their recent drive towards increased programmability they can also be used to process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often have to be reworked to make optimal use of the limited resources available. There are very few custom hardware devices available for acceleration of the photon mapping algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of producing the same physical accuracy and therefore realism, there are similarities between the algorithms. There have been several hardware prototypes, and at least one commercial offering, created with the goal of accelerating ray-trace rendering [3]. However, properties making many of these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping. There are even fewer proposals for acceleration of the additional functions found only in photon mapping. All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently difficult to scale, while many of the custom hardware devices available thus far make use of large processing elements and complex acceleration data structures. In this work we make use of three novel approaches in the design of highly scalable specialised hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is gained through: • The use of a brute-force approach in place of the commonly used smart approach, thus eliminating much data pre-processing, complex data structures and large processing units often required. • The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a reduction in processing area requirement. • A novel redesign of the photon inclusion test, used within the photon search method of the photon mapping algorithm. This allows an intelligent memory structure to be used for the search. The design uses two hardware structures, both of which accelerate one core rendering function. Renderings produced using field programmable gate array (FPGA) based prototypes are presented, along with details of 90nm synthesised versions of the designs which show that close to an orderof- magnitude speedup over a software implementation is possible. Due to the scalable nature of the design, it is likely that any advantage can be maintained in the face of improving processor speeds. Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used software acceleration method. This means that the device can interface almost directly to a frontend modelling package, minimising much of the pre-processing required by most other proposals

    Improved Encoding for Compressed Textures

    Get PDF
    For the past few decades, graphics hardware has supported mapping a two dimensional image, or texture, onto a three dimensional surface to add detail during rendering. The complexity of modern applications using interactive graphics hardware have created an explosion of the amount of data needed to represent these images. In order to alleviate the amount of memory required to store and transmit textures, graphics hardware manufacturers have introduced hardware decompression units into the texturing pipeline. Textures may now be stored as compressed in memory and decoded at run-time in order to access the pixel data. In order to encode images to be used with these hardware features, many compression algorithms are run offline as a preprocessing step, often times the most time-consuming step in the asset preparation pipeline. This research presents several techniques to quickly serve compressed texture data. With the goal of interactive compression rates while maintaining compression quality, three algorithms are presented in the class of endpoint compression formats. The first uses intensity dilation to estimate compression parameters for low-frequency signal-modulated compressed textures and offers up to a 3X improvement in compression speed. The second, FasTC, shows that by estimating the final compression parameters, partition-based formats can choose an approximate partitioning and offer orders of magnitude faster encoding speed. The third, SegTC, shows additional improvement over selecting a partitioning by using a global segmentation to find the boundaries between image features. This segmentation offers an additional 2X improvement over FasTC while maintaining similar compressed quality. Also presented is a case study in using texture compression to benefit two dimensional concave path rendering. Compressing pixel coverage textures used for compositing yields both an increase in rendering speed and a decrease in storage overhead. Additionally an algorithm is presented that uses a single layer of indirection to adaptively select the block size compressed for each texture, giving a 2X increase in compression ratio for textures of mixed detail. Finally, a texture storage representation that is decoded at runtime on the GPU is presented. The decoded texture is still compressed for graphics hardware but uses 2X fewer bytes for storage and network bandwidth.Doctor of Philosoph

    Parallel For Loops on Heterogeneous Resources

    Get PDF
    In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously

    Efficient Many-Light Rendering of Scenes with Participating Media

    Get PDF
    We present several approaches based on virtual lights that aim at capturing the light transport without compromising quality, and while preserving the elegance and efficiency of many-light rendering. By reformulating the integration scheme, we obtain two numerically efficient techniques; one tailored specifically for interactive, high-quality lighting on surfaces, and one for handling scenes with participating media
    corecore