23 research outputs found

    An fpga-based loco-ans implementation for lossless and near-lossless image compression using high-level synthesis

    Full text link
    MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliationsIn this work, we present and evaluate a hardware architecture for the LOCO-ANS (Low Complexity Lossless Compression with Asymmetric Numeral Systems) lossless and near-lossless image compressor, which is based on JPEG-LS standard. The design is implemented in two FPGA generations, evaluating its performance for different codec configurations. The tests show that the design is capable of up to 40.5 MPixels/s and 124 MPixels/s per lane for Zynq 7020 and UltraScale+ FPGAs, respectively. Compared to the single thread LOCO-ANS software implementation running in a 1.2 GHz Raspberry Pi 3B, each hardware lane achieves 6.5 times higher throughput, even when implemented in an older and cost-optimized chip like the Zynq 7020. Results are also presented for a lossless only version, which achieves a lower footprint and approximately 50% higher performance than the version that supports both lossless and near-lossless. Interestingly, these great results were obtained applying High-Level Synthesis, describing the coder with C++ code, which tends to establish a trade-off between design time and quality of results. These results show that the algorithm is very suitable for hardware implementation. Moreover, the implemented system is faster and achieves higher compression than the best previously available near-lossless JPEG-LS hardware implementationThis research was funded in part by the Spanish Research Agency under the project AgileMon (AEI PID2019-104451RB-C21

    LOCO-ANS: An Optimization of JPEG-LS Using an Efficient and Low-Complexity Coder Based on ANS

    Full text link
    Near-lossless compression is a generalization of lossless compression, where the codec user is able to set the maximum absolute difference (the error tolerance) between the values of an original pixel and the decoded one. This enables higher compression ratios, while still allowing the control of the bounds of the quantization errors in the space domain. This feature makes them attractive for applications where a high degree of certainty is required. The JPEG-LS lossless and near-lossless image compression standard combines a good compression ratio with a low computational complexity, which makes it very suitable for scenarios with strong restrictions, common in embedded systems. However, our analysis shows great coding efficiency improvement potential, especially for lower entropy distributions, more common in near-lossless. In this work, we propose enhancements to the JPEG-LS standard, aimed at improving its coding efficiency at a low computational overhead, particularly for hardware implementations. The main contribution is a low complexity and efficient coder, based on Tabled Asymmetric Numeral Systems (tANS), well suited for a wide range of entropy sources and with simple hardware implementation. This coder enables further optimizations, resulting in great compression ratio improvements. When targeting photographic images, the proposed system is capable of achieving, in mean, 1.6%, 6%, and 37.6% better compression for error tolerances of 0, 1, and 10, respectively. Additional improvements are achieved increasing the context size and image tiling, obtaining 2.3% lower bpp for lossless compression. Our results also show that our proposal compares favorably against state-of-the-art codecs like JPEG-XL and WebP, particularly in near-lossless, where it achieves higher compression ratios with a faster coding speedThis work was supported in part by the Spanish Research Agency through the Project AgileMon under Grant AEI PID2019-104451RB-C2

    Parallel hardware architecture for JPEG-LS based on domain decomposition using context sets

    Get PDF
    This thesis investigates the scope of parallelism of the lossless JPEG-LS encoder. The input is not taken to be the entire image anymore; instead it is streams of pixels from an image sensor in every clock cycle. So the data dependencies that already exist due to the context modelling process and the effect of incomplete image data were analyzed thoroughly here. Other approaches of parallelism in JPEG-LS (e.g. pipelined hardware or software implementations that modify the context update procedures) deviate from the standard defined by ISO/ITU. On the other hand, the proposed technique here is fully compatible to the standard. In this work, a unique pixel loading mechanism (i.e. in the form that the encoder expects them to be) was developed from the streams of pixel. Later in order to store the pixels of the same context that are yet to be processed, another unique buffering mechanism was developed. However the context distribution of individual pixel determines the maximum achievable parallelism and thus a fixed value is not guaranteed in any case. The thesis also presents a vhdl implementation of the proposed parallel JPEG-LS encoder. The target hardware for this design was an FPGA board (Virtex 5). The design was also compared with the sequential hardware implementation and other parallel implementation in terms of speed up mainly. However there were some obstacles that restricted the actual synthesis. Possible reasons behind them are discussed with further suggestions for future work

    Bi-criteria Pipeline Mappings for Parallel Image Processing

    Get PDF
    Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonistic criteria should be optimized, such as throughput and latency (or a combination). Typical applications include digital image processing, where images are processed in steady-state mode. In this paper, we study the mapping of a particular image processing application, the JPEG encoding. Mapping pipelined JPEG encoding onto parallel platforms is useful for instance for encoding Motion JPEG images. As the bi-criteria mapping problem is NP-complete, we concentrate on the evaluation and performance of polynomial heuristics

    Image and Video Coding Techniques for Ultra-low Latency

    Get PDF
    The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe

    An overview of JPEG 2000

    Get PDF
    JPEG-2000 is an emerging standard for still image compression. This paper provides a brief history of the JPEG-2000 standardization process, an overview of the standard, and some description of the capabilities provided by the standard. Part I of the JPEG-2000 standard specifies the minimum compliant decoder, while Part II describes optional, value-added extensions. Although the standard specifies only the decoder and bitstream syntax, in this paper we describe JPEG-2000 from the point of view of encoding. We take this approach, as we believe it is more amenable to a compact description more easily understood by most readers.

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    Efficient Encoding of Wireless Capsule Endoscopy Images Using Direct Compression of Colour Filter Array Images

    Get PDF
    Since its invention in 2001, wireless capsule endoscopy (WCE) has played an important role in the endoscopic examination of the gastrointestinal tract. During this period, WCE has undergone tremendous advances in technology, making it the first-line modality for diseases from bleeding to cancer in the small-bowel. Current research efforts are focused on evolving WCE to include functionality such as drug delivery, biopsy, and active locomotion. For the integration of these functionalities into WCE, two critical prerequisites are the image quality enhancement and the power consumption reduction. An efficient image compression solution is required to retain the highest image quality while reducing the transmission power. The issue is more challenging due to the fact that image sensors in WCE capture images in Bayer Colour filter array (CFA) format. Therefore, standard compression engines provide inferior compression performance. The focus of this thesis is to design an optimized image compression pipeline to encode the capsule endoscopic (CE) image efficiently in CFA format. To this end, this thesis proposes two image compression schemes. First, a lossless image compression algorithm is proposed consisting of an optimum reversible colour transformation, a low complexity prediction model, a corner clipping mechanism and a single context adaptive Golomb-Rice entropy encoder. The derivation of colour transformation that provides the best performance for a given prediction model is considered as an optimization problem. The low complexity prediction model works in raster order fashion and requires no buffer memory. The application of colour transformation yields lower inter-colour correlation and allows the efficient independent encoding of the colour components. The second compression scheme in this thesis is a lossy compression algorithm with a integer discrete cosine transformation at its core. Using the statistics obtained from a large dataset of CE image, an optimum colour transformation is derived using the principal component analysis (PCA). The transformed coefficients are quantized using optimized quantization table, which was designed with a focus to discard medically irrelevant information. A fast demosaicking algorithm is developed to reconstruct the colour image from the lossy CFA image in the decoder. Extensive experiments and comparisons with state-of-the-art lossless image compression methods establish the superiority of the proposed compression methods as simple and efficient image compression algorithm. The lossless algorithm can transmit the image in a lossless manner within the available bandwidth. On the other hand, performance evaluation of lossy compression algorithm indicates that it can deliver high quality images at low transmission power and low computation costs

    High-performance hardware accelerators for image processing in space applications

    Get PDF
    Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions
    corecore