954 research outputs found

    An fpga-based loco-ans implementation for lossless and near-lossless image compression using high-level synthesis

    Full text link
    MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliationsIn this work, we present and evaluate a hardware architecture for the LOCO-ANS (Low Complexity Lossless Compression with Asymmetric Numeral Systems) lossless and near-lossless image compressor, which is based on JPEG-LS standard. The design is implemented in two FPGA generations, evaluating its performance for different codec configurations. The tests show that the design is capable of up to 40.5 MPixels/s and 124 MPixels/s per lane for Zynq 7020 and UltraScale+ FPGAs, respectively. Compared to the single thread LOCO-ANS software implementation running in a 1.2 GHz Raspberry Pi 3B, each hardware lane achieves 6.5 times higher throughput, even when implemented in an older and cost-optimized chip like the Zynq 7020. Results are also presented for a lossless only version, which achieves a lower footprint and approximately 50% higher performance than the version that supports both lossless and near-lossless. Interestingly, these great results were obtained applying High-Level Synthesis, describing the coder with C++ code, which tends to establish a trade-off between design time and quality of results. These results show that the algorithm is very suitable for hardware implementation. Moreover, the implemented system is faster and achieves higher compression than the best previously available near-lossless JPEG-LS hardware implementationThis research was funded in part by the Spanish Research Agency under the project AgileMon (AEI PID2019-104451RB-C21

    Design and FPGA Implementation of High Speed DWT-IDWT Architecture with Pipelined SPIHT Architecture for Image Compression

    Get PDF
    Image compression demands high speed architectures for transformation and encoding process Medical image compression demands lossless compression schemes and faster architectures A trade-off between speed and area decides the complexity of image compression algorithms In this work a high speed DWT architecture and pipelined SPIHT architecture is designed modeled and implemented on FPGA platform DWT computation is performed using matrix multiplication operation and is implemented on Virtex-5 FPGA that consumes less than 1 of the hardware resource The SPIHT algorithm that is performed using pipelined architecture and hence achieves higher throughput and latency The SPIHT algorithm operates at a frequency of 260 MHz and occupies area less than 15 of the resources The architecture designed is suitable for high speed image compression application

    Gbit/second lossless data compression hardware

    Get PDF
    This thesis investigates how to improve the performance of lossless data compression hardware as a tool to reduce the cost per bit stored in a computer system or transmitted over a communication network. Lossless data compression allows the exact reconstruction of the original data after decompression. Its deployment in some high-bandwidth applications has been hampered due to performance limitations in the compressing hardware that needs to match the performance of the original system to avoid becoming a bottleneck. Advancing the area of lossless data compression hardware, hence, offers a valid motivation with the potential of doubling the performance of the system that incorporates it with minimum investment. This work starts by presenting an analysis of current compression methods with the objective of identifying the factors that limit performance and also the factors that increase it. [Continues.

    Zooplankton visualization system: design and real-time lossless image compression

    Get PDF
    In this thesis, I present a design of a small, self-contained, underwater plankton imaging system. I base the imaging system’s design on an embedded PC architecture based on PC/104-Plus standards to meet the compact size and low power requirements. I developed a simple graphical user interface to run on a real-time operating system to control the imaging system. I also address how a real-time image compression scheme implemented on an FPGA chip speeds up image transfer speeds of the imaging system. Since lossless compression of the image is required in order to retain all image details, I began with an established compression scheme like SPIHT, and latter proposed a new compression scheme that suits the imaging system’s requirements. I provide an estimate of the total amount of resources required and propose suitable FPGA chips to implement the compression scheme. Finally, I present various parallel designs by which the FPGA chip can be integrated into the imaging system

    CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression

    Full text link
    As supercomputers continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. In this paper, we propose a hardware-algorithm co-design of efficient and adaptive lossy compressor for scientific data on FPGAs (called CEAZ) to accelerate parallel I/O. Our contribution is fourfold: (1) We propose an efficient Huffman coding approach that can adaptively update Huffman codewords online based on codewords generated offline (from a variety of representative scientific datasets). (2) We derive a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode, enabling accurate offline Huffman codewords generation. This also helps us create a fixed-ratio compression mode for consistent throughput. (3) We develop an efficient compression pipeline by adopting cuSZ's dual-quantization algorithm to our hardware use case. (4) We evaluate CEAZ on five real-world datasets with both a single FPGA board and 128 nodes from Bridges-2 supercomputer. Experiments show that CEAZ outperforms the second-best FPGA-based lossy compressor by 2X of throughput and 9.6X of compression ratio. It also improves MPI_File_write and MPI_Gather throughputs by up to 25.8X and 24.8X, respectively.Comment: 14 pages, 17 figures, 8 table

    Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization

    Get PDF
    PCIe is a high-performing interface used to move data from a central host PC to an accelerator such as Field Programmable Gate Arrays (FPGA). This interface allows a system to perform fast data transfers in High-Performance Computing (HPC) and provide a performance boost. However, HPC systems normally require large datasets, and in these situations PCIe can become a bottleneck. To address this issue, we propose an open-source hardware compression/decompression system that can be used to adapt with continuously-streamed data with low latency and high throughput. We implement a compressor and decompressor engines on FPGA, scale up with multiple engines working in parallel, and evaluate the energy reduction and performance with different numbers of multiple engines. To alleviate the performance bottleneck in the processor acting as a controller, we propose a hardware scheduler to fairly distribute the datasets among the engines. Our design reduces the transmission time in PCIe, and the results show an energy reduction of up to 48% in the PCIe transfers, thanks to the decrease in the number of bits that have to be transmitted. The overhead in terms of latency is maintained to a minimum and user selectable depending on the tolerances of the intended application

    Challenges and solutions in H.265/HEVC for integrating consumer electronics in professional video systems

    Get PDF
    • …
    corecore