Search CORE

321 research outputs found

Implementation of JPEG compression and motion estimation on FPGA hardware

Author: Gopalakrishnan Ramakrishna
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2008
Field of study

A hardware implementation of JPEG allows for real-time compression in data intensivve applications, such as high speed scanning, medical imaging and satellite image transmission. Implementation options include dedicated DSP or media processors, FPGA boards, and ASICs. Factors that affect the choice of platform selection involve cost, speed, memory, size, power consumption, and case of reconfiguration. The proposed hardware solution is based on a Very high speed integrated circuit Hardware Description Language (VHDL) implememtation of the codec with prefered realization using an FPGA board due to speed, cost and flexibility factors; The VHDL language is commonly used to model hardware impletations from a top down perspective. The VHDL code may be simulated to correct mistakes and subsequently synthesized into hardware using a synthesis tool, such as the xilinx ise suite. The same VHDL code may be synthesized into a number of sifferent hardware architetcures based on constraints given. For example speed was the major constraint when synthesizing the pipeline of jpeg encoding and decoding, while chip area and power consumption were primary constraints when synthesizing the on-die memory because of large area. Thus, there is a trade off between area and speed in logic synthesis

University of Nevada, Las Vegas Repository

LOCO-ANS: An Optimization of JPEG-LS Using an Efficient and Low-Complexity Coder Based on ANS

Author: Alonso Tobias
López de Vergara Méndez J. E.
Sutter Gustavo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/07/2021
Field of study

Near-lossless compression is a generalization of lossless compression, where the codec user is able to set the maximum absolute difference (the error tolerance) between the values of an original pixel and the decoded one. This enables higher compression ratios, while still allowing the control of the bounds of the quantization errors in the space domain. This feature makes them attractive for applications where a high degree of certainty is required. The JPEG-LS lossless and near-lossless image compression standard combines a good compression ratio with a low computational complexity, which makes it very suitable for scenarios with strong restrictions, common in embedded systems. However, our analysis shows great coding efficiency improvement potential, especially for lower entropy distributions, more common in near-lossless. In this work, we propose enhancements to the JPEG-LS standard, aimed at improving its coding efficiency at a low computational overhead, particularly for hardware implementations. The main contribution is a low complexity and efficient coder, based on Tabled Asymmetric Numeral Systems (tANS), well suited for a wide range of entropy sources and with simple hardware implementation. This coder enables further optimizations, resulting in great compression ratio improvements. When targeting photographic images, the proposed system is capable of achieving, in mean, 1.6%, 6%, and 37.6% better compression for error tolerances of 0, 1, and 10, respectively. Additional improvements are achieved increasing the context size and image tiling, obtaining 2.3% lower bpp for lossless compression. Our results also show that our proposal compares favorably against state-of-the-art codecs like JPEG-XL and WebP, particularly in near-lossless, where it achieves higher compression ratios with a faster coding speedThis work was supported in part by the Spanish Research Agency through the Project AgileMon under Grant AEI PID2019-104451RB-C2

Biblos-e Archivo

Energy-efficient hardware design based on high-level synthesis

Author: Muslim FAHAD BIN
Publication venue: Politecnico di Torino
Publication date: 01/01/2017
Field of study

This dissertation describes research activities broadly concerning the area of High-level synthesis (HLS), but more specifically, regarding the HLS-based design of energy-efficient hardware (HW) accelerators. HW accelerators, mostly implemented on FPGAs, are integral to the heterogeneous architectures employed in modern high performance computing (HPC) systems due to their ability to speed up the execution while dramatically reducing the energy consumption of computationally challenging portions of complex applications. Hence, the first activity was regarding an HLS-based approach to directly execute an OpenCL code on an FPGA instead of its traditional GPU-based counterpart. Modern FPGAs offer considerable computational capabilities while consuming significantly smaller power as compared to high-end GPUs. Several different implementations of the K-Nearest Neighbor algorithm were considered on both FPGA- and GPU-based platforms and their performance was compared. FPGAs were generally more energy-efficient than the GPUs in all the test cases. Eventually, we were also able to get a faster (in terms of execution time) FPGA implementation by using an FPGA-specific OpenCL coding style and utilizing suitable HLS directives. The second activity was targeted towards the development of a methodology complementing HLS to automatically derive power optimization directives (also known as "power intent") from a system-level design description and use it to drive the design steps after HLS, by producing a directive file written using the common power format (CPF) to achieve power shut-off (PSO) in case of an ASIC design. The proposed LP-HLS methodology reduces the design effort by enabling designers to infer low power information from the system-level description of a design rather than at the RTL. This methodology required a SystemC description of a generic power management module to describe the design context of a HW module also modeled in SystemC, along with the development of a tool to automatically produce the CPF file to accomplish PSO. Several test cases were considered to validate the proposed methodology and the results demonstrated its ability to correctly extract the low power information and apply it to achieve power optimization in the backend flow

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Adaptively Lossy Image Compression for Onboard Processing

Author: Goodwill Justin
Publication venue
Publication date: 29/07/2020
Field of study

More efficient image-compression codecs are an emerging requirement for spacecraft because increasingly complex, onboard image sensors can rapidly saturate downlink bandwidth of communication transceivers. While these codecs reduce transmitted data volume, many are compute-intensive and require rapid processing to sustain sensor data rates. Emerging next-generation small satellite (SmallSat) computers provide compelling computational capability to enable more onboard processing and compression than previously considered. For this research, we apply two compression algorithms for deployment on modern flight hardware: (1) end-to-end, neural-network-based, image compression (CNN-JPEG); and (2) adaptive image compression through feature-point detection (FPD-JPEG). These algorithms rely on intelligent data-processing pipelines that adapt to sensor data to compress it more effectively, ensuring efficient use of limited downlink bandwidths. The first algorithm, CNN-JPEG, employs a hybrid approach adapted from literature combining convolutional neural networks (CNNs) and JPEG; however, we modify and tune the training scheme for satellite imagery to account for observed training instabilities. This hybrid CNN-JPEG approach shows 23.5% better average peak signal-to-noise ratio (PSNR) and 33.5% better average structural similarity index (SSIM) versus standard JPEG on a dataset collected on the Space Test Program – Houston 5 (STP-H5-CSP) mission onboard the International Space Station (ISS). For our second algorithm, we developed a novel adaptive image-compression pipeline based upon JPEG that leverages the Oriented FAST and Rotated BRIEF (ORB) feature-point detection algorithm to adaptively tune the compression ratio to allow for a tradeoff between PSNR/SSIM and combined file size over a batch of STP-H5-CSP images. We achieve a less than 1% drop in average PSNR and SSIM while reducing the combined file size by 29.6% compared to JPEG using a static quality factor (QF) of 90

D-Scholarship@Pitt

JPEG decoder implementation on FPGA using dynamic partial reconfiguration

Author: Rodrigues Tiago Augusto Nunes
Publication venue: Instituto Superior de Engenharia de Lisboa
Publication date: 01/06/2015
Field of study

Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e telecomunicaçõesEsta tese descreve o estudo realizado sobre o tema de Sistemas Computacionais Reconfiguráveis utilizando Field-Programmable Gate Array (FPGA). Sistemas Computacionais Reconfiguráveis é um conceito tão antigo como a computação utilizando circuitos electrónicos. Para explorar os aspetos práticos do conceito, foi implementado um descodificador de imagens codificadas em sistema Baseline JPEGsobre uma FPGA da família Zynq™-7000. Realizado todo o trabalho de desenho, implementação e depuração do descodificador utilizando métodos tradicionais de implementação estática da lógica na FPGA, foi posteriormente realizado o trabalho de adaptação do descodificador desenvolvido para implementação na mesma FPGA utilizando métodos de implementação com reconfiguração parcialdinâmica. Este novo método tem como objetivo principal a realização de um descodificador funcional utilizando apenas uma parte dos recursos lógicos da FPGA quando comparado com a implementação estática do descodificador. A utilização de reconfiguração dinâmica tem como consequência um incremento da complexidade do sistema, originando, numa perspetiva macro, diferenças entre ambos os descodificadores, mas globalmente baseados nos mesmos critérios de desenho e partilhando grande parte dos módulos internos. São ainda descritos os passos para atingir o objetivo, de forma a clarificar o processo de reconfiguração parcial dinâmica para uma aplicação em eventuais novos critérios de projeto e diferentes cenários de aplicação. Esta tese explora ainda o desenvolvimento de sistemas auxiliares que permitem a descodificação direta de ficheiros .jpg e a sua apresentação num monitor VGA.Abstract: This thesis describes a study conducted in Reconfigurable Computing using a Field-Programmable Gate Array (FPGA). Reconfigurable Computing is a concept almost as old as high-speed electronic computing itself. To explore the practical aspects of the concept, a Baseline JPEG image decoder was implemented over a Zynq™-7000 family FPGA. After using traditional methods for the design, implementation and debugging of static decoder logic, the work path was set to adapt the decoder to be implemented on the same FPGA using methods based on Dynamic Partial Reconfiguration. Using this approach the main objective was to develop a working decoder with only a subset of the used resources ofthe FPGA when compared to static implementation of the similar decoder. The dynamic partial reconfiguration brings some additional complexity to the system resulting on two different decoders from a macro perspective view but globally relying on the same design considerations and that share the majority of the internal modules. The steps to achieve the objective are described in order to clarify the dynamic partial reconfiguration process and to eventually open new design possibilities that can be exploited in different application scenarios. The thesis also explores the development of auxiliary systems to enable the ability to decode direct .jpg files and present them on a VGA monitor

Repositório Científico do Instituto Politécnico de Lisboa

Implementation of soft processor based SOC for JPEG compression on FPGA

Author: Raju Y. David Solomon
Swarna K.S.V.
Publication venue: 'ICT Academy'
Publication date: 01/02/2015
Field of study

With the advent of semiconductor process and EDA tools technology, IC designers can integrate more functions. However, to reduce the demand of time-to-market and tackle the increasing complexity of SoC, the need of fast prototyping and testing is growing. Taking advantage of deep submicron technology, modern FPGAs provide a fast and low-cost prototyping with large logic resources and high performance. So the hardware is mapped onto an emulation platform based on FPGA that mimics the behaviour of SOC. In this paper we use FPGA as a system on chip which is then used for image compression by 2-D DCT respectively and proposed SoC for image compression using soft core Microblaze. The JPEG standard defines compression techniques for image data. As a consequence, it allows to store and transfer image data with considerably reduced demand for storage space and bandwidth. From the four processes provided in the JPEG standard, only one, the baseline process is widely used. Proposed SoC for JPEG compression has been implemented on FPGA Spartan-6 SP605 evaluation board using Xilinx platform studio, because field programmable gate array have reconfigurable hardware architecture. Hence the JPEG image with high speed and reduced size can be obtained at low risk and low power consumption of about 0.699W. The proposed SoC for image compression is evaluated at 83.33MHz on Xilinx Spartan-6 FPGA

Deakin Research Online

A Methodology for Predicting Application-Specific Achievable Memory Bandwidth for HW/SW-Codesign

Author: Elhossini Ahmed
Göbel Matthias
Juurlink Ben
Publication venue
Publication date: 01/01/2017
Field of study

The trend of using heterogeneous computing and HW/SW-Codesign approaches allows increasing performance significantly while reducing power consumption. One of the main challenges when combining multiple processing devices is the communication, as an inefficient communication configuration can pose a bottleneck to the overall system performance. To address this problem, we present a methodology that assists the designer in making good design decisions for systems using shared DDR memory for communication. Our methodology analyzes a software implementation of the application and subsequently predicts the memory accesses of a functionally equivalent hardware implementation of the selected function. We furthermore propose an IP core that can perform these predicted memory accesses to estimate the achievable memory bandwidth between a functionally equivalent hardware implementation and shared memory. The resulting achievable memory bandwidth estimations differ by less than 2% from the actual achievable memory bandwidth of a functionally equivalent hardware implementation, demonstrating the feasibility of the presented methodology

DepositOnce