2,847 research outputs found

    Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS

    Full text link
    Many modern video processing pipelines rely on edge-aware (EA) filtering methods. However, recent high-quality methods are challenging to run in real-time on embedded hardware due to their computational load. To this end, we propose an area-efficient and real-time capable hardware implementation of a high quality EA method. In particular, we focus on the recently proposed permeability filter (PF) that delivers promising quality and performance in the domains of HDR tone mapping, disparity and optical flow estimation. We present an efficient hardware accelerator that implements a tiled variant of the PF with low on-chip memory requirements and a significantly reduced external memory bandwidth (6.4x w.r.t. the non-tiled PF). The design has been taped out in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and achieves a high compute density of 6.7 GFLOPS/mm2 (12x higher than embedded GPUs when scaled to the same technology node). The low area and bandwidth requirements make the accelerator highly suitable for integration into SoCs where silicon area budget is constrained and external memory is typically a heavily contended resource

    Scalable wavelet-based coding of irregular meshes with interactive region-of-interest support

    Get PDF
    This paper proposes a novel functionality in wavelet-based irregular mesh coding, which is interactive region-of-interest (ROI) support. The proposed approach enables the user to define the arbitrary ROIs at the decoder side and to prioritize and decode these regions at arbitrarily high-granularity levels. In this context, a novel adaptive wavelet transform for irregular meshes is proposed, which enables: 1) varying the resolution across the surface at arbitrarily fine-granularity levels and 2) dynamic tiling, which adapts the tile sizes to the local sampling densities at each resolution level. The proposed tiling approach enables a rate-distortion-optimal distribution of rate across spatial regions. When limiting the highest resolution ROI to the visible regions, the fine granularity of the proposed adaptive wavelet transform reduces the required amount of graphics memory by up to 50%. Furthermore, the required graphics memory for an arbitrary small ROI becomes negligible compared to rendering without ROI support, independent of any tiling decisions. Random access is provided by a novel dynamic tiling approach, which proves to be particularly beneficial for large models of over 10(6) similar to 10(7) vertices. The experiments show that the dynamic tiling introduces a limited lossless rate penalty compared to an equivalent codec without ROI support. Additionally, rate savings up to 85% are observed while decoding ROIs of tens of thousands of vertices

    Scalable Interactive Volume Rendering Using Off-the-shelf Components

    Get PDF
    This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators

    Developing efficient web-based GIS applications

    Get PDF
    There is an increase in the number of web-based GIS applications over the recent years. This paper describes different mapping technologies, database standards, and web application development standards that are relevant to the development of web-based GIS applications. Different mapping technologies for displaying geo-referenced data are available and can be used in different situations. This paper also explains why Oracle is the system of choice for geospatial applications that need to handle large amounts of data. Wireframing and design patterns have been shown to be useful in making GIS web applications efficient, scalable and usable, and should be an important part of every web-based GIS application. A range of different development technologies are available, and their use in different operating environments has been discussed here in some detail

    Memory architecture for efficient utilization of SDRAM: a case study of the computation/memory access trade-off

    Get PDF
    This paper discusses the trade-off between calculations and memory accesses in a 3D graphics tile renderer for visualization of data from medical scanners. The performance requirement of this application is a frame rate of 25 frames per second when rendering 3D models with 2 million triangles, i.e. 50 million triangles per second, sustained (not peak). At present, a software implementation is capable of 3-4 frames per second for a 1 million triangle model

    A Parallel Rendering Algorithm for MIMD Architectures

    Get PDF
    Applications such as animation and scientific visualization demand high performance rendering of complex three dimensional scenes. To deliver the necessary rendering rates, highly parallel hardware architectures are required. The challenge is then to design algorithms and software which effectively use the hardware parallelism. A rendering algorithm targeted to distributed memory MIMD architectures is described. For maximum performance, the algorithm exploits both object-level and pixel-level parallelism. The behavior of the algorithm is examined both analytically and experimentally. Its performance for large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 shows increasing performance from 1 to 128 processors across a wide range of scene complexities. It is shown that minimal modifications to the algorithm will adapt it for use on shared memory architectures as well
    corecore