Search CORE

310 research outputs found

High Performance Multi-Standard Architecture for DCT Computation in H.264/AVC High Profile and HEVC Codecs

Author: Dias Tiago
Roma Nuno
Sousa Leonel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2013
Field of study

A new high performance architecture for the computation of all the DCT operations adopted in the H.264/AVC and HEVC standards is proposed in this paper. Contrasting to other dedicated transform cores, the presented multi-standard transform architecture is supported on a completely configurable, scalable and unified structure, that is able to compute not only the forward and the inverse 8×8 and 4×4 integer DCTs and the 4×4 and 2×2 Hadamard transforms defined in the H.264/AVC standard, but also the 4×4, 8×8, 16×16 and 32×32 integer transforms adopted in HEVC. Experimental results obtained using a Xilinx Virtex-7 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which outperforms its more prominent related designs by at least 1.8 times. When integrated in a multi-core embedded system, this architecture allows the computation, in real-time, of all the transforms mentioned above for resolutions as high as the 8k Ultra High Definition Television (UHDTV) (7680×4320 @ 30fps)

Repositório Científico do Instituto Politécnico de Lisboa

Exploring the design space of HEVC inverse transforms with dataflow programming

Author: Rahman A. A. H. A.
Yion K. Z.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 12/01/2016
Field of study

This paper presents the design space exploration of the hardware-based inverse fixed-point integer transform for High Efficiency Video Coding (HEVC). The designs are specified at high-level using CAL dataflow language and automatically synthesized to HDL for FPGA implementation. Several parallel design alternatives are proposed with trade-off between performance and resource. The HEVC transform consists of several independent components from 4x4 to 32x32 discrete cosine transform and 4x4 discrete sine transform.This work explores the strategies to efficiently compute the transforms by applying data parallelism on the different components. Results show that an intermediate version of parallelism, whereby the 4x4 and 8x8 are merged together, and the 16x16 and 32x32 merged together gives the best trade-off between performance and resource. The results presented in this work also give an insight on how the HEVC transform can be designed efficiently in parallel for hardware implementation

Universiti Teknologi Malaysia Institutional Repository

An Efficient Data-aided Synchronization in L-DACS1 for Aeronautical Communications

Author: Madhukumar A. S.
Pham Thinh H.
Vinod A. P.
Publication venue
Publication date: 01/01/2017
Field of study

L-band Digital Aeronautical Communication System type-1 (L-DACS1) is an emerging standard that aims at enhancing air traffic management (ATM) by transitioning the traditional analog aeronautical communication systems to the superior and highly efficient digital domain. L-DACS1 employs modern and efficient orthogonal frequency division multiplexing (OFDM) modulation technique to achieve more efficient and higher data rate in comparison to the existing aeronautical communication systems. However, the performance of OFDM systems is very sensitive to synchronization errors. L-DACS1 transmission is in the L-band aeronautical channels that suffer from large interference and large Doppler shifts, which makes the synchronization for L-DACS more challenging. This paper proposes a novel computationally efficient synchronization method for L-DACS1 systems that offers robust performance. Through simulation, the proposed method is shown to provide accurate symbol timing offset (STO) estimation as well as fractional carrier frequency offset (CFO) estimation in a range of aeronautical channels. In particular, it can yield excellent synchronization performance in the face of a large carrier frequency offset.Comment: In the proceeding of International Conference on Data Mining, Communications and Information Technology (DMCIT

arXiv.org e-Print Archive

Crossref

Real-time embedded video denoiser prototype

Author: Bouyer Manuel
Gaillard Boris
Lacassagne Lionel
Lemaitre Florian
Menard Patrice
Meunier Quentin
Petreto Andrea
Romera Thomas
Publication venue: HAL CCSD
Publication date: 28/01/2020
Field of study

International audienceLow light or other poor visibility conditions often generate noise on any vision system. However, video denoising requires a lot of computational effort and most of the state-of-the-art algorithms cannot be run in real-time at camera framerate. Noisy video is thus a major issue especially for embedded systems that provide low computational power. This article presents a new real-time video denoising algorithm for embedded platforms called RTE-VD [1]. We first compare its denoising capabilities with other online and offline algorithms. We show that RTE-VD can achieve real-time performance (25 frames per second) for qHD video (960x540 pixels) on embedded CPUs with an output image quality comparable to state-of-the-art algorithms. In order to reach real-time denoising, we applied several high-level transforms and optimizations. We study the relation between computation time and power consumption on several embedded CPUs and show that it is possible to determine find out frequency and core configurations in order to minimize either the computation time or the energy. Finally, we introduce VIRTANS our embedded real-time video denoiser based on RTE-VD

Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors

Author: McDonald-Maier Klaus
Qadri Muhammad Yasir
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2009
Field of study

Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures

University of Essex Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

Hardware Realization of an FPGA Processor – Operating System Call Offload and Experiences

Author: Hindborg Andreas Erik
Jensen Nicklas Bo
Karlsson Sven
Schleuniger Pascal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

Accurate Events Synchronization in a System-on-Chip Navigation Receiver

Author: Beaugendre Guillaume
Dion Arnaud
Kasaraneni Raghuveer
Priot Benoit
Publication venue: HAL CCSD
Publication date: 04/06/2019
Field of study

International audienceA System-On-Chip design and synchronization details of a navigation receiver are presented. The architecture of the GNSS receiver is easily modifiable and offers the capability of accurate time management, thanks to the use of a co-design approach. The purpose of such a platform is to allow real time validation of research algorithms. A secondary application is education, as this platform can be used to study signal demodulation and navigation. The receiver is fully functional, but further developments are still undergoing. Results demonstrate accuracy, flexibility and ease of use of the system

Design of an Embedded Low Complexity Image Coder using CAL language

Author: Abid Mohamed
Déforges Olivier
Jerbi Khaled
Raulet Mickaël
Publication venue: HAL CCSD
Publication date: 22/09/2009
Field of study

International audienceThe increasing complexity of image codecs and the time to market requires a high level design. Caltrop Actor Language (CAL) is a domain-specific language that provides useful abstractions for dataflow programming with actor. It has been chosen by the ISO/IEC standardization organization in the new MPEG standard called Reconfigurable Video Coding. This framework is adopted to design a multitude of codecs by combining actors. We present in this paper the specification and synthesis of the image coder LAR (Locally adaptive resolution) using the CAL framework. An HDL description and generation tools are used. The results show that such a high level design is possible. The quality of the resulting decoder implementation turns out to be better than that of a VHDL reference design. In the following, the main parts of the LAR coder will be presented; we will introduce the basic notions of the CAL language and its infrastructure (edition, simulation and HDL synthesis tools) and the results will be discussed

HAL-CentraleSupelec

HAL-Rennes 1