46 research outputs found

    High throughput image compression and decompression on GPUs

    Get PDF
    Diese Arbeit befasst sich mit der Entwicklung eines GPU-freundlichen, intra-only, Wavelet-basierten Videokompressionsverfahrens mit hohem Durchsatz, das für visuell verlustfreie Anwendungen optimiert ist. Ausgehend von der Beobachtung, dass der JPEG 2000 Entropie-Kodierer ein Flaschenhals ist, werden verschiedene algorithmische Änderungen vorgeschlagen und bewertet. Zunächst wird der JPEG 2000 Selective Arithmetic Coding Mode auf der GPU realisiert, wobei sich die Erhöhung des Durchsatzes hierdurch als begrenzt zeigt. Stattdessen werden zwei nicht standard-kompatible Änderungen vorgeschlagen, die (1) jede Bitebebene in nur einem einzelnen Pass verarbeiten (Single-Pass-Modus) und (2) einen echten Rohcodierungsmodus einführen, der sample-weise parallelisierbar ist und keine aufwendige Kontextmodellierung erfordert. Als nächstes wird ein alternativer Entropiekodierer aus der Literatur, der Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), evaluiert. Er gibt Signaladaptivität zu Gunsten von höherer Parallelität auf und daher wird hier untersucht und gezeigt, dass ein aus verschiedensten Testsequenzen gemitteltes statisches Wahrscheinlichkeitsmodell eine kompetitive Kompressionseffizienz erreicht. Es wird zudem eine Kombination von BPC-PaCo mit dem Single-Pass-Modus vorgeschlagen, der den Speedup gegenüber dem JPEG 2000 Entropiekodierer von 2,15x (BPC-PaCo mit zwei Pässen) auf 2,6x (BPC-PaCo mit Single-Pass-Modus) erhöht auf Kosten eines um 0,3 dB auf 1,0 dB erhöhten Spitzen-Signal-Rausch-Verhältnis (PSNR). Weiter wird ein paralleler Algorithmus zur Post-Compression Ratenkontrolle vorgestellt sowie eine parallele Codestream-Erstellung auf der GPU. Es wird weiterhin ein theoretisches Laufzeitmodell formuliert, das es durch Benchmarking von einer GPU ermöglicht die Laufzeit einer Routine auf einer anderen GPU vorherzusagen. Schließlich wird der erste JPEG XS GPU Decoder vorgestellt und evaluiert. JPEG XS wurde als Low Complexity Codec konzipiert und forderte erstmals explizit GPU-Freundlichkeit bereits im Call for Proposals. Ab Bitraten über 1 bpp ist der Decoder etwa 2x schneller im Vergleich zu JPEG 2000 und 1,5x schneller als der schnellste hier vorgestellte Entropiekodierer (BPC-PaCo mit Single-Pass-Modus). Mit einer GeForce GTX 1080 wird ein Decoder Durchsatz von rund 200 fps für eine UHD-4:4:4-Sequenz erreicht.This work investigates possibilities to create a high throughput, GPU-friendly, intra-only, Wavelet-based video compression algorithm optimized for visually lossless applications. Addressing the key observation that JPEG 2000’s entropy coder is a bottleneck and might be overly complex for a high bit rate scenario, various algorithmic alterations are proposed. First, JPEG 2000’s Selective Arithmetic Coding mode is realized on the GPU, but the gains in terms of an increased throughput are shown to be limited. Instead, two independent alterations not compliant to the standard are proposed, that (1) give up the concept of intra-bit plane truncation points and (2) introduce a true raw-coding mode that is fully parallelizable and does not require any context modeling. Next, an alternative block coder from the literature, the Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), is evaluated. Since it trades signal adaptiveness for increased parallelism, it is shown here how a stationary probability model averaged from a set of test sequences yields competitive compression efficiency. A combination of BPC-PaCo with the single-pass mode is proposed and shown to increase the speedup with respect to the original JPEG 2000 entropy coder from 2.15x (BPC-PaCo with two passes) to 2.6x (proposed BPC-PaCo with single-pass mode) at the marginal cost of increasing the PSNR penalty by 0.3 dB to at most 1 dB. Furthermore, a parallel algorithm is presented that determines the optimal code block bit stream truncation points (given an available bit rate budget) and builds the entire code stream on the GPU, reducing the amount of data that has to be transferred back into host memory to a minimum. A theoretical runtime model is formulated that allows, based on benchmarking results on one GPU, to predict the runtime of a kernel on another GPU. Lastly, the first ever JPEG XS GPU-decoder realization is presented. JPEG XS was designed to be a low complexity codec and for the first time explicitly demanded GPU-friendliness already in the call for proposals. Starting at bit rates above 1 bpp, the decoder is around 2x faster compared to the original JPEG 2000 and 1.5x faster compared to JPEG 2000 with the fastest evaluated entropy coder (BPC-PaCo with single-pass mode). With a GeForce GTX 1080, a decoding throughput of around 200 fps is achieved for a UHD 4:4:4 sequence

    [[alternative]]The Research and Implementation of the Wireless Optical Communications Transceiver(II)

    Get PDF
    計畫編號:NSC94-2745-E032-002-URD研究期間:200508~200607研究經費:739,000[[sponsorship]]行政院國家科學委員

    Efficient JPEG 2000 Image Compression Scheme for Multihop Wireless Networks

    Get PDF
     When using wireless sensor networks for real-time data transmission, some critical points should be considered. Restricted computational power, reduced memory, narrow bandwidth and energy supplied present strong limits in sensor nodes. Therefore, maximizing network lifetime and minimizing energy consumption are always optimization goals. To overcome the computation and energy limitation of individual sensor nodes during image transmission, an energy efficient image transport scheme is proposed, taking advantage of JPEG2000 still image compression standard using MATLAB and C from Jasper. JPEG2000 provides a practical set of features, not necessarily available in the previous standards. These features were achieved using techniques: the discrete wavelet transform (DWT), and embedded block coding with optimized truncation (EBCOT). Performance of the proposed image transport scheme is investigated with respect to image quality and energy consumption. Simulation results are presented and show that the proposed scheme optimizes network lifetime and reduces significantly the amount of required memory by analyzing the functional influence of each parameter of this distributed image compression algorithm.

    Efficient JPEG 2000 Image Compression Scheme for Multihop Wireless Networks

    Full text link

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

    Parallel architectural design space exploration for real-time image compression

    Get PDF
    Embedded block coding with optimized truncation (EBCOT) is a coding algorithm used in JPEG2000. EBCOT operates on the wavelet transformed data to generate highly scalable compressed bit stream. Sub-band samples obtained from wavelet transform are partitioned into smaller blocks called code-blocks. EBCOT encoding is done on blocks to avoid error propagation through the bands and to increase robustness. Block wise encoding provides flexibility for parallel hardware implementation of EBCOT. The encoding process in JPEG2000 is divided into two phases: Tier 1 coding (Entropy encoding) and Tier 2 coding (Tag tree coding). This thesis deals with design space exploration and implementation of parallel hardware architecture of Tier 1 encoder used in JPEG2000. Parallel capabilities of Tier-1 encoder is the motivation for exploration of high performance real time image compression architecture in hardware. The design space covers the following investigations: - The effect of block-size in terms of resources, speed, and compression performance, - Computational performance. The key computational performance parameters targeted by the architecture are - significant speedup compared to a sequential implementation, - minimum processing latency and, - minimum logic resource utilization. The proposed architecture is developed for an embedded application system, coded in VHDL and synthesized for implementation on Xilinx FPGA system

    A digital signature and watermarking based authentication system for JPEG2000 images

    Get PDF
    In this thesis, digital signature based authentication system was introduced, which is able to protect JPEG2000 images in different flavors, including fragile authentication and semi-fragile authentication. The fragile authentication is to protect the image at code-stream level, and the semi-fragile is to protect the image at the content level. The semi-fragile can be further classified into lossy and lossless authentication. With lossless authentication, the original image can be recovered after verification. The lossless authentication and the new image compression standard, JPEG2000 is mainly discussed in this thesis

    An efficient technique for lossless address data compression using adaptive SPIHT Algorithm in WSN

    Get PDF
    The computer is becoming more and more powerful day by day. Data compression is a popular approach to reducing data volumes and hence lowering disk I/O and network data transfer times. While several lossy data compression techniques have demonstrated excellent compression ratios, lossless data compression techniques are still among the most popular ones. Sensor networks represent a non-traditional source of information, as readings generated by sensors flow continuously, leading to an infinite stream of data. Sensors are non-reactive elements which are used to monitor real life phenomena, such as live weather conditions, network traffic, etc. They are usually organized into networks where their readings are transmitted using low level protocols

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    Get PDF
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

    JPEG2000 image coding system theory and applications

    Get PDF
    JPEG2000, the new standard for still image coding, provides a new framework and an integrated toolbox to better address increasing needs for compression. It offers a wide range of functionalities such as lossless and lossy coding, embedded lossy to lossless coding, progression by resolution and quality, high compression efficiency, error resilience and region-of-interest (ROI) coding. Comparative results have shown that JPEG2000 is indeed superior to established image compression standards. Overall, the JPEG2000 standard offers the richest set of features in a very efficient way and within a unified algorithm. The price of this is its additional complexity, but this should not be perceived as a disadvantage, as the technology evolves rapidly
    corecore