29 research outputs found

    Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution

    No full text
    Super-resolution (SR) techniques constitute a key element in image applications, which need high-resolution reconstruction, while in the worst case, only a single low-resolution observation is available. SR techniques involve computationally demanding processes, and thus, researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI SR method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of ultra-high definition content, by achieving three (3×) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9×) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4×) times faster than the real-time on low-end Xilinx Virtex 5 devices and 69 times (69×) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image processing algorithms: on window-based disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14×) to 64 times (64×), while the proposed FPGA architecture provides 29× acceleration. © 2016, Springer-Verlag Berlin Heidelberg

    A graphics parallel memory organization exploiting request correlations

    No full text
    Real-time graphics applications require memory organizations featuring parallel pixel access and low-cost implementation. This work bases on a nonlinear skew mapping scheme and exploits the correlation between consecutive requests for pixels to design an efficient parallel memory organization. The mapping achieves parallel access, of mn pixels in various shapes, to the memory organized with mn banks. The proposed design technique combines the mapping properties and the spatial correlations among pixel requests to eliminate conflicts by spending at most one extra cycle every mn consecutive parallel pixel accesses. Consequently, the technique ensures that any pixel patternamong these commonly used in graphicscan be accessed in a single cycle from any image location. The address computations become straightforward as the numbers of the requested pixels and the banksapart from equalcan be powers of 2. © 2006 IEEE

    Reduced Complexity Super-Resolution for Low-Bitrate Video Compression

    No full text
    Evolving video applications impose requirements for high image quality, low bitrate, and/or small computational cost. This paper combines state-of-the-art coding and superresolution (SR) techniques to improve video compression both in terms of coding efficiency and complexity. The proposed approach improves a generic decimation-quantization compression scheme by introducing low complexity single-image SR techniques for rescaling the data at the decoder side and by jointly exploring/optimizing the downsampling/upsampling processes. The enhanced scheme achieves improvement of the quality and system's complexity compared with conventional codecs and can be easily modified to meet various diverse requirements, such as effectively supporting any off-the-shelf video codec, for instance H.264/Advanced Video Coding or High Efficiency Video Coding. Our approach builds on studying the generic scheme's parameterization with common rescaling techniques to achieve 2.4-dB peak signal-to-noise ratio (PSNR) quality improvement at low-bitrates compared with the conventional codecs and proposes a novel SR algorithm to advance the critical bitrate at the level of 10 Mb/s. The evaluation of the SR algorithm includes the comparison of its performance to other image rescaling solutions of the literature. The results show quality improvement by 5-dB PSNR over straightforward interpolation techniques and computational time reduction by three orders of magnitude when compared with the highly involved methods of the field. Therefore, our algorithm proves to be most suitable for use in reduced complexity downsampled compression schemes

    Study of interpolation filters for motion estimation with application in H.264/AVC encoders

    Get PDF
    Image super-resolution plays an important role in a plethora of applications, including video compression and motion estimation. Detecting fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR [1] [2]. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Timing analysis shows that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60−90% of the total encoding time depending on the design configuration. FME bases on an interpolation procedure to increase the resolution of any frame region by generating sub-pixels between the original pixels. Modern compression standards specify the exact filter to use in the Motion Compensation module allowing the encoder and the decoder to create and use identical reference frames. In particular, H.264/AVC specifies a 6-tap filter for computing the luma values of half-pixels and a low cost 2-tap filter for computing quarter-pixels. Even though it is common practice for encoder designers to integrate the standard 6-tap filter also in the Estimation module (before Compensation), the fact is that the interpolation technique used for detecting the displacements (not computing their residual) is an open choice following certain performance trade-offs. Aiming at speeding up the Estimation, a process of considerably higher computational demand than the Compensation, this work builds on the potential to implement a lower complexity interpolation technique instead of the H.264 6-tap filter. We integrate in the Estimation module several distinct interpolation techniques not included in the H.264 standard, while keeping the standard H.264/AVC Compensation to measure their impact on the outcome of the prediction engine. Related bibliography includes both ideas to avoid/replace the standard computations, as well as architecturestargeting the efficient implementation of the H.264 6-tap filtering procedure and the support of its increased memory requirements. To this end, we note that H.264 specifies a kernel with coefficients ⟨1,−5,20,20,−5,1⟩ to be multiplied with six consecutive pixels of the frame (either in column or row format). The resulting six products are accumulated and normalized for the generation of a single half-pixel (between 3 rd and 4 th tap). The operation must be repeated for each “horizontal” and “vertical” half-pixelby sliding the kernel on the frame, both in row and column order. Moreover, there exist as many “diagonal” half-pixels to be generated by applying the kernel on previously computed horizontal or vertical half-pixels. That is to say, depending on its position, we must process 6 or 36 frame pixels to compute a single half-pixel. To avoid the costly H.264 filter in the Estimation module, we study similar interpolation techniques using less than 6 taps, possibly exploiting gradients on the image. Section II shows three commonly used interpolation techniques and introduces three novel techniques to point out the differences of the proposed. Section III reports the performance results of these techniques and Section IV concludes the paper

    Single-image super-resolution using low complexity adaptive iterative back-projection.

    No full text

    Low Complexity Interpolation Filters for Motion Estimation and Application to the H.264 Encoders

    No full text
    Techniques for image super-resolution play an important role in a plethora of applications, which include video compression and motion estimation. The detection of the fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Researchers have performed timing analysis for the motion estimation process and they reported that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60-90% of the total encoding time depending on the design configuration

    Device and circuit-level performance of carbon nanotube field-effect transistor with benchmarking against a nano-MOSFET

    No full text
    The performance of a semiconducting carbon nanotube (CNT) is assessed and tabulated for parameters against those of a metal-oxide-semiconductor field-effect transistor (MOSFET). Both CNT and MOSFET models considered agree well with the trends in the available experimental data. The results obtained show that nanotubes can significantly reduce the drain-induced barrier lowering effect and subthreshold swing in silicon channel replacement while sustaining smaller channel area at higher current density. Performance metrics of both devices such as current drive strength, current on-off ratio (Ion/Ioff), energy-delay product, and power-delay product for logic gates, namely NAND and NOR, are presented. Design rules used for carbon nanotube field-effect transistors (CNTFETs) are compatible with the 45-nm MOSFET technology. The parasitics associated with interconnects are also incorporated in the model. Interconnects can affect the propagation delay in a CNTFET. Smaller length interconnects result in higher cutoff frequency. © 2012 Tan et al

    Acceleration Techniques and Evaluation on Multicore CPU, GPU and FPGA for Image Processing and Super-Resolution

    Get PDF
    Super-Resolution (SR) techniques constitute a key element in image applications, which need high- resolution reconstruction while in the worst case only a single low-resolution observation is available. SR techniques involve computationally demanding processes and thus researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI Super-Resolution (SR) method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of Ultra-High Definition content, by achieving three (3x) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9x) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4x) times faster than the real-time on low-end Xilinx Virtex 5 devices and sixty-nine times (69x) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image-processing algorithms: on window-based Disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14x) to 64 times (64x) while the proposed FPGA architecture provides 29x acceleration

    Acceleration Techniques and Evaluation on Multicore CPU, GPU and FPGA for Image Processing and Super-Resolution

    No full text
    Super-Resolution (SR) techniques constitute a key element in image applications, which need high- resolution reconstruction while in the worst case only a single low-resolution observation is available. SR techniques involve computationally demanding processes and thus researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI Super-Resolution (SR) method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of Ultra-High Definition content, by achieving three (3x) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9x) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4x) times faster than the real-time on low-end Xilinx Virtex 5 devices and sixty-nine times (69x) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image-processing algorithms: on window-based Disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14x) to 64 times (64x) while the proposed FPGA architecture provides 29x acceleration

    A control-theoretic approach for efficient design of filters in DAC and digital audio amplifiers

    No full text
    A control-theoretic approach in designing Digital-to-Analogue Converters and Digital Amplifiers which leads to improved performance in Audio and Multimedia applications is presented in this paper. The design involves an over-sampling and a pulse modulation component which is driven by a pulse generation algorithm based on the characteristics of the output filter. The theoretical model results in a family of digital circuits whose operation is verified by computer simulations achieving a performance of Signal-to-Noise Ratio of 147 dB at a switching rate of 90 MHz. Implementation and hardware complexity issues are discussed based on a FPGA realization of the algorithm. © 2010 Springer Science+Business Media, LLC
    corecore