310 research outputs found

    High Performance Multi-Standard Architecture for DCT Computation in H.264/AVC High Profile and HEVC Codecs

    Get PDF
    A new high performance architecture for the computation of all the DCT operations adopted in the H.264/AVC and HEVC standards is proposed in this paper. Contrasting to other dedicated transform cores, the presented multi-standard transform architecture is supported on a completely configurable, scalable and unified structure, that is able to compute not only the forward and the inverse 8×8 and 4×4 integer DCTs and the 4×4 and 2×2 Hadamard transforms defined in the H.264/AVC standard, but also the 4×4, 8×8, 16×16 and 32×32 integer transforms adopted in HEVC. Experimental results obtained using a Xilinx Virtex-7 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which outperforms its more prominent related designs by at least 1.8 times. When integrated in a multi-core embedded system, this architecture allows the computation, in real-time, of all the transforms mentioned above for resolutions as high as the 8k Ultra High Definition Television (UHDTV) (7680×4320 @ 30fps)

    Exploring the design space of HEVC inverse transforms with dataflow programming

    Get PDF
    This paper presents the design space exploration of the hardware-based inverse fixed-point integer transform for High Efficiency Video Coding (HEVC). The designs are specified at high-level using CAL dataflow language and automatically synthesized to HDL for FPGA implementation. Several parallel design alternatives are proposed with trade-off between performance and resource. The HEVC transform consists of several independent components from 4x4 to 32x32 discrete cosine transform and 4x4 discrete sine transform.This work explores the strategies to efficiently compute the transforms by applying data parallelism on the different components. Results show that an intermediate version of parallelism, whereby the 4x4 and 8x8 are merged together, and the 16x16 and 32x32 merged together gives the best trade-off between performance and resource. The results presented in this work also give an insight on how the HEVC transform can be designed efficiently in parallel for hardware implementation

    An Efficient Data-aided Synchronization in L-DACS1 for Aeronautical Communications

    Full text link
    L-band Digital Aeronautical Communication System type-1 (L-DACS1) is an emerging standard that aims at enhancing air traffic management (ATM) by transitioning the traditional analog aeronautical communication systems to the superior and highly efficient digital domain. L-DACS1 employs modern and efficient orthogonal frequency division multiplexing (OFDM) modulation technique to achieve more efficient and higher data rate in comparison to the existing aeronautical communication systems. However, the performance of OFDM systems is very sensitive to synchronization errors. L-DACS1 transmission is in the L-band aeronautical channels that suffer from large interference and large Doppler shifts, which makes the synchronization for L-DACS more challenging. This paper proposes a novel computationally efficient synchronization method for L-DACS1 systems that offers robust performance. Through simulation, the proposed method is shown to provide accurate symbol timing offset (STO) estimation as well as fractional carrier frequency offset (CFO) estimation in a range of aeronautical channels. In particular, it can yield excellent synchronization performance in the face of a large carrier frequency offset.Comment: In the proceeding of International Conference on Data Mining, Communications and Information Technology (DMCIT

    Real-time embedded video denoiser prototype

    Get PDF
    International audienceLow light or other poor visibility conditions often generate noise on any vision system. However, video denoising requires a lot of computational effort and most of the state-of-the-art algorithms cannot be run in real-time at camera framerate. Noisy video is thus a major issue especially for embedded systems that provide low computational power. This article presents a new real-time video denoising algorithm for embedded platforms called RTE-VD [1]. We first compare its denoising capabilities with other online and offline algorithms. We show that RTE-VD can achieve real-time performance (25 frames per second) for qHD video (960x540 pixels) on embedded CPUs with an output image quality comparable to state-of-the-art algorithms. In order to reach real-time denoising, we applied several high-level transforms and optimizations. We study the relation between computation time and power consumption on several embedded CPUs and show that it is possible to determine find out frequency and core configurations in order to minimize either the computation time or the energy. Finally, we introduce VIRTANS our embedded real-time video denoiser based on RTE-VD

    Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors

    Get PDF
    Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures

    Accurate Events Synchronization in a System-on-Chip Navigation Receiver

    Get PDF
    International audienceA System-On-Chip design and synchronization details of a navigation receiver are presented. The architecture of the GNSS receiver is easily modifiable and offers the capability of accurate time management, thanks to the use of a co-design approach. The purpose of such a platform is to allow real time validation of research algorithms. A secondary application is education, as this platform can be used to study signal demodulation and navigation. The receiver is fully functional, but further developments are still undergoing. Results demonstrate accuracy, flexibility and ease of use of the system

    Design of an Embedded Low Complexity Image Coder using CAL language

    Get PDF
    International audienceThe increasing complexity of image codecs and the time to market requires a high level design. Caltrop Actor Language (CAL) is a domain-specific language that provides useful abstractions for dataflow programming with actor. It has been chosen by the ISO/IEC standardization organization in the new MPEG standard called Reconfigurable Video Coding. This framework is adopted to design a multitude of codecs by combining actors. We present in this paper the specification and synthesis of the image coder LAR (Locally adaptive resolution) using the CAL framework. An HDL description and generation tools are used. The results show that such a high level design is possible. The quality of the resulting decoder implementation turns out to be better than that of a VHDL reference design. In the following, the main parts of the LAR coder will be presented; we will introduce the basic notions of the CAL language and its infrastructure (edition, simulation and HDL synthesis tools) and the results will be discussed
    corecore