56 research outputs found

    Fast Huffman decoding by exploiting data level parallelism

    Get PDF
    The frame rates and resolutions of digital videos are on the rising edge. Thereby, pushing the compression ratios of video coding standards to their limits, resulting in more complex and computational power hungry algorithms. Programmable solutions are gaining interest to keep up the pace of the evolving video coding standards, by reducing the time-to-market of upcoming video products. However, to compete with hardwired solutions, parallelism needs to be exploited on as many levels as possible. In this paper the focus will be on data level parallelism. Huffman coding is proven to be very efficient and therefore commonly applied in many coding standards. However, due to the inherently sequential nature, parallelization of the Huffman decoding is considered hard. The proposed fully flexible and programmable acceleration exploits available data level parallelism in Huffman decoding. Our implementation achieves a decoding speed of 106 MBit/s while running on a 250 MHz processor. This is a speed-up of 24Ă— compared to our sequential reference implementation

    Data and resource management in wireless networks via data compression, GPS-free dissemination, and learning

    Get PDF
    “This research proposes several innovative approaches to collect data efficiently from large scale WSNs. First, a Z-compression algorithm has been proposed which exploits the temporal locality of the multi-dimensional sensing data and adapts the Z-order encoding algorithm to map multi-dimensional data to a one-dimensional data stream. The extended version of Z-compression adapts itself to working in low power WSNs running under low power listening (LPL) mode, and comprehensively analyzes its performance compressing both real-world and synthetic datasets. Second, it proposed an efficient geospatial based data collection scheme for IoTs that reduces redundant rebroadcast of up to 95% by only collecting the data of interest. As most of the low-cost wireless sensors won’t be equipped with a GPS module, the virtual coordinates are used to estimate the locations. The proposed work utilizes the anchor-based virtual coordinate system and DV-Hop (Distance vector of hops to anchors) to estimate the relative location of nodes to anchors. Also, it uses circle and hyperbola constraints to encode the position of interest (POI) and any user-defined trajectory into a data request message which allows only the sensors in the POI and routing trajectory to collect and route. It also provides location anonymity by avoiding using and transmitting GPS location information. This has been extended also for heterogeneous WSNs and refined the encoding algorithm by replacing the circle constraints with the ellipse constraints. Last, it proposes a framework that predicts the trajectory of the moving object using a Sequence-to-Sequence learning (Seq2Seq) model and only wakes-up the sensors that fall within the predicted trajectory of the moving object with a specially designed control packet. It reduces the computation time of encoding geospatial trajectory by more than 90% and preserves the location anonymity for the local edge servers”--Abstract, page iv

    High throughput image compression and decompression on GPUs

    Get PDF
    Diese Arbeit befasst sich mit der Entwicklung eines GPU-freundlichen, intra-only, Wavelet-basierten Videokompressionsverfahrens mit hohem Durchsatz, das für visuell verlustfreie Anwendungen optimiert ist. Ausgehend von der Beobachtung, dass der JPEG 2000 Entropie-Kodierer ein Flaschenhals ist, werden verschiedene algorithmische Änderungen vorgeschlagen und bewertet. Zunächst wird der JPEG 2000 Selective Arithmetic Coding Mode auf der GPU realisiert, wobei sich die Erhöhung des Durchsatzes hierdurch als begrenzt zeigt. Stattdessen werden zwei nicht standard-kompatible Änderungen vorgeschlagen, die (1) jede Bitebebene in nur einem einzelnen Pass verarbeiten (Single-Pass-Modus) und (2) einen echten Rohcodierungsmodus einführen, der sample-weise parallelisierbar ist und keine aufwendige Kontextmodellierung erfordert. Als nächstes wird ein alternativer Entropiekodierer aus der Literatur, der Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), evaluiert. Er gibt Signaladaptivität zu Gunsten von höherer Parallelität auf und daher wird hier untersucht und gezeigt, dass ein aus verschiedensten Testsequenzen gemitteltes statisches Wahrscheinlichkeitsmodell eine kompetitive Kompressionseffizienz erreicht. Es wird zudem eine Kombination von BPC-PaCo mit dem Single-Pass-Modus vorgeschlagen, der den Speedup gegenüber dem JPEG 2000 Entropiekodierer von 2,15x (BPC-PaCo mit zwei Pässen) auf 2,6x (BPC-PaCo mit Single-Pass-Modus) erhöht auf Kosten eines um 0,3 dB auf 1,0 dB erhöhten Spitzen-Signal-Rausch-Verhältnis (PSNR). Weiter wird ein paralleler Algorithmus zur Post-Compression Ratenkontrolle vorgestellt sowie eine parallele Codestream-Erstellung auf der GPU. Es wird weiterhin ein theoretisches Laufzeitmodell formuliert, das es durch Benchmarking von einer GPU ermöglicht die Laufzeit einer Routine auf einer anderen GPU vorherzusagen. Schließlich wird der erste JPEG XS GPU Decoder vorgestellt und evaluiert. JPEG XS wurde als Low Complexity Codec konzipiert und forderte erstmals explizit GPU-Freundlichkeit bereits im Call for Proposals. Ab Bitraten über 1 bpp ist der Decoder etwa 2x schneller im Vergleich zu JPEG 2000 und 1,5x schneller als der schnellste hier vorgestellte Entropiekodierer (BPC-PaCo mit Single-Pass-Modus). Mit einer GeForce GTX 1080 wird ein Decoder Durchsatz von rund 200 fps für eine UHD-4:4:4-Sequenz erreicht.This work investigates possibilities to create a high throughput, GPU-friendly, intra-only, Wavelet-based video compression algorithm optimized for visually lossless applications. Addressing the key observation that JPEG 2000’s entropy coder is a bottleneck and might be overly complex for a high bit rate scenario, various algorithmic alterations are proposed. First, JPEG 2000’s Selective Arithmetic Coding mode is realized on the GPU, but the gains in terms of an increased throughput are shown to be limited. Instead, two independent alterations not compliant to the standard are proposed, that (1) give up the concept of intra-bit plane truncation points and (2) introduce a true raw-coding mode that is fully parallelizable and does not require any context modeling. Next, an alternative block coder from the literature, the Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), is evaluated. Since it trades signal adaptiveness for increased parallelism, it is shown here how a stationary probability model averaged from a set of test sequences yields competitive compression efficiency. A combination of BPC-PaCo with the single-pass mode is proposed and shown to increase the speedup with respect to the original JPEG 2000 entropy coder from 2.15x (BPC-PaCo with two passes) to 2.6x (proposed BPC-PaCo with single-pass mode) at the marginal cost of increasing the PSNR penalty by 0.3 dB to at most 1 dB. Furthermore, a parallel algorithm is presented that determines the optimal code block bit stream truncation points (given an available bit rate budget) and builds the entire code stream on the GPU, reducing the amount of data that has to be transferred back into host memory to a minimum. A theoretical runtime model is formulated that allows, based on benchmarking results on one GPU, to predict the runtime of a kernel on another GPU. Lastly, the first ever JPEG XS GPU-decoder realization is presented. JPEG XS was designed to be a low complexity codec and for the first time explicitly demanded GPU-friendliness already in the call for proposals. Starting at bit rates above 1 bpp, the decoder is around 2x faster compared to the original JPEG 2000 and 1.5x faster compared to JPEG 2000 with the fastest evaluated entropy coder (BPC-PaCo with single-pass mode). With a GeForce GTX 1080, a decoding throughput of around 200 fps is achieved for a UHD 4:4:4 sequence

    Low delay video coding

    Get PDF
    Analogue wireless cameras have been employed for decades, however they have not become an universal solution due to their difficulties of set up and use. The main problem is the link robustness which mainly depends on the requirement of a line-of-sight view between transmitter and receiver, a working condition not always possible. Despite the use of tracking antenna system such as the Portable Intelligent Tracking Antenna (PITA [1]), if strong multipath fading occurs (e.g. obstacles between transmitter and receiver) the picture rapidly falls apart. Digital wireless cameras based on Orthogonal Frequency Division Multiplexing (OFDM) modulation schemes give a valid solution for the above problem. OFDM offers strong multipath protection due to the insertion of the guard interval; in particular, the OFDM-based DVB-T standard has proven to offer excellent performance for the broadcasting of multimedia streams with bit rates over 10 Mbps in difficult terrestrial propagation channels, for fixed and portable applications. However, in typical conditions, the latency needed to compress/decompress a digital video signal at Standard Definition (SD) resolution is of the order of 15 frames, which corresponds to ≃ 0.5 sec. This delay introduces a serious problem when wireless and wired cameras have to be interfaced. Cabled cameras do not use compression, because the cable which directly links transmitter and receiver does not impose restrictive bandwidth constraints. Therefore, the only latency that affects a cable cameras link system is the on cable propagation delay, almost not significant, when switching between wired and wireless cameras, the residual latency makes it impossible to achieve the audio-video synchronization, with consequent disagreeable effects. A way to solve this problem is to provide a low delay digital processing scheme based on a video coding algorithm which avoids massive intermediate data storage. The analysis of the last MPEG based coding standards puts in evidence a series of problems which limits the real performance of a low delay MPEG coding system. The first effort of this work is to study the MPEG standard to understand its limit from both the coding delay and implementation complexity points of views. This thesis also investigates an alternative solution based on HERMES codec, a proprietary algorithm which is described implemented and evaluated. HERMES achieves better results than MPEG in terms of latency and implementation complexity, at the price of higher compression ratios, which means high output bit rates. The use of HERMES codec together with an enhanced OFDM system [2] leads to a competitive solution for wireless digital professional video applications

    An Introduction to Neural Data Compression

    Full text link
    Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models, and generative adversarial networks. The present article aims to introduce this field of research to a broader machine learning audience by reviewing the necessary background in information theory (e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image quality assessment, perceptual metrics), and providing a curated guide through the essential ideas and methods in the literature thus far

    Advanced constellation and demapper schemes for next generation digital terrestrial television broadcasting systems

    Get PDF
    206 p.Esta tesis presenta un nuevo tipo de constelaciones llamadas no uniformes. Estos esquemas presentan una eficacia de hasta 1,8 dB superior a las utilizadas en los últimos sistemas de comunicaciones de televisión digital terrestre y son extrapolables a cualquier otro sistema de comunicaciones (satélite, móvil, cable¿). Además, este trabajo contribuye al diseño de constelaciones con una nueva metodología que reduce el tiempo de optimización de días/horas (metodologías actuales) a horas/minutos con la misma eficiencia. Todas las constelaciones diseñadas se testean bajo una plataforma creada en esta tesis que simula el estándar de radiodifusión terrestre más avanzado hasta la fecha (ATSC 3.0) bajo condiciones reales de funcionamiento.Por otro lado, para disminuir la latencia de decodificación de estas constelaciones esta tesis propone dos técnicas de detección/demapeo. Una es para constelaciones no uniformes de dos dimensiones la cual disminuye hasta en un 99,7% la complejidad del demapeo sin empeorar el funcionamiento del sistema. La segunda técnica de detección se centra en las constelaciones no uniformes de una dimensión y presenta hasta un 87,5% de reducción de la complejidad del receptor sin pérdidas en el rendimiento.Por último, este trabajo expone un completo estado del arte sobre tipos de constelaciones, modelos de sistema, y diseño/demapeo de constelaciones. Este estudio es el primero realizado en este campo

    FPGA-based architectures for next generation communications networks

    Get PDF
    This engineering doctorate concerns the application of Field Programmable Gate Array (FPGA) technology to some of the challenges faced in the design of next generation communications networks. The growth and convergence of such networks has fuelled demand for higher bandwidth systems, and a requirement to support a diverse range of payloads across the network span. The research which follows focuses on the development of FPGA-based architectures for two important paradigms in contemporary networking - Forward Error Correction and Packet Classification. The work seeks to combine analysis of the underlying algorithms and mathematical techniques which drive these applications, with an informed approach to the design of efficient FPGA-based circuits

    Transmission of compressed images over power line channel

    Get PDF
    In the telecommunications industry, the use of existing power lines has drawn the attention of many researchers in the recent years. PLC suffers from impulsive noise that can affect data transmission by causing bit or burst errors. In this thesis, PLC channel was used as a transmission scheme to transmit compressed still images using FFT-OFDM. When lossy compression is applied to an image, a small loss of quality in the compressed image is tolerated. One of the challenging tasks in image compression and transmission is the trade-off between compression ratio and image quality. Therefore, we utilized the latest developments in quality assessment techniques, SSIM, to adaptively optimize this trade-off to the type of image application which the compression is being used for. A comparison between different compression techniques, namely, discrete cosine transform (DCT), discrete wavelet transform (DWT), and block truncation coding (BTC) was carried out. The performance criteria for our compression methods include the compression ratio, relative root-meansquared (RMS) error of the received data, and image quality evaluation via structural similarity index (SSIM). Every link in a powerline has its own attenuation profile depending on the length, layout, and cable types. Also, the influences of multipath fading due to reflections at branching point vary the attenuation profile of the link. As a result, we observed the effect of different parameters of the PLC channel based on the number of paths, and length of link on the quality of the image. Simulations showed that the image quality is highly affected by the interaction of the distance of PLC channel link and the number of multipath reflections. The PLC channel is assumed to be subjected to Gaussian and impulsive noises. There are two types of impulsive noise: asynchronous impulsive noise and periodic impulsive noise synchronous to the mains frequency. BER analysis was performed to compare the performance of the channel for the two types of impulsive noise under three impulsive scenarios. The first scenario is named as "heavily disturbed" and it was measured during the evening hours in a transformer substation in an industrial area. The second scenario is named as "moderately disturbed" and was recorded in a transformer substation in a residential area with detached and terraced houses. The third scenario is named as "weakly disturbed" and was recorded during night-time in an apartment located in a large building. The experiments conducted showed that both types of noise performed similarly in the three impulsive noise scenarios. We implemented Bose-Chaudhuri-Hocquenghen (BCH) coding to study the performance of Power Line Channel (PLC) impaired by impulsive noise and AWGN. BCH codes and RS codes are related and their decoding algorithms are quite similar. A comparison was made between un-coded system and BCH coding system. The performance of the system is assessed by the quality of the image for different sizes of BCH encoder, in three different impulsive environments. Simulation results showed that with BCH coding, the performance of the PLC system has improved dramatically in all three impulsive scenarios

    Fast Motion Estimation Algorithms for Block-Based Video Coding Encoders

    Get PDF
    The objective of my research is reducing the complexity of video coding standards in real-time scalable and multi-view applications
    • …
    corecore