245 research outputs found

    Adaptive Multicore Scheduling for the LTE Uplink

    Get PDF
    International audienceThe next generation cellular system of 3GPP is named Long Term Evolution (LTE). Each millisecond, a LTE base station receives information from up to one hundred users. Multicore heterogeneous embedded systems with Digital Signal Processors (DSP) and coprocessors are power efficient solutions to decode the LTE uplink signals in base stations. The LTE uplink is a highly variable algorithm. Its multicore scheduling must be adapted every millisecond to the number of connected users and to the data rate they require. To solve the issue of the dynamic deployment while maintaining low latency, one approach would be to find efficient on-the-fly solutions using techniques such as graph generation and scheduling. This approach is opposed to a static scheduling of predefined cases. We show that the static approach is not suitable for the LTE uplink and that present DSP cores are powerful enough to recompute an efficient adaptive schedule for the LTE uplink most complex cases in real-time

    Software Code Generation for the RVC-CAL Language

    Get PDF
    International audienceThe MPEG Reconfigurable Video Coding (RVC) framework is a new standard under development by MPEG that aims at providing a unified high-level specification of current and future MPEG video coding technologies using dataflow models. In this framework, a decoder is built as a configuration of video coding modules taken from the standard MPEG toolbox library or proprietary libraries. The elements of the library are specified by a textual description that expresses the I/O behavior of each module and by a reference software written using a subset of the CAL Actor Language named RVC-CAL. A decoder configuration is written in an XML dialect by connecting a set of CAL modules. Code generators are fundamental supports that enable the direct transformation of a high level specification to efficient hardware and software implementations. This paper presents a synthesis tool that from a CAL dataflow program generates C code and an associated SystemC model. The generated code is validated against the original CAL description simulated using the Open Dataflow environment. Experimental results of the translation of two descriptions of an MPEG-4 Simple Profile decoder with different granularities are shown and discussed

    Design of a Processor Optimized for Syntax Parsing in Video Decoders

    No full text
    8International audienceHeterogeneous platforms aim to offer both performance and flexibility by providing designers processors and programmable logical units on a single platform. Processors implemented on these platforms are usually soft-cores (e.g. Altera NIOS) or ASIC (e.g. ARM Cortex-A8). However, these processors still face limitations in terms of performance compared to full hardware designs in particular for real-time video decoding applications. We present in this paper an innovative approach to improve performance using both a processor optimized for the syntax parsing (an Application-Specific Instruction-set Processor) and a FPGA. The case study has been synthesized on a Xilinx FPGA at a frequency of 100MHz and we estimate the performance that could be obtained with an ASIC

    HDS, a real-time multi-DSP motion estimator for MPEG-4 H.264 AVC high definition video encoding

    Get PDF
    International audienceH.264 AVC video compression standard achieves high compression rates at the cost of a high encoder complexity. The encoder performances are greatly linked to the motion estimation operation which requires high computation power and memory bandwidth. High definition context magnifies the difficulty of a real-time implementation. EPZS and HME are two well-known motion estimation algorithms. Both EPZS and HME are implemented in a DSP and their performances are compared in terms of both quality and complexity. Based on these results, a new algorithm called HDS for Hierarchical Diamond Search is proposed. HDS motion estimation is integrated in a AVC encoder to extract timings and resulting video qualities reached. A real-time DSP implementation of H.264 quarter-pixel accuracy motion estimation is proposed for SD and HD video format. Furthermore HDS characteristics make this algorithm well suited for H.264 SVC real-time encoding applications

    Optimization of automatically generated multi-core code for the LTE RACH-PD algorithm

    Get PDF
    Embedded real-time applications in communication systems require high processing power. Manual scheduling devel-oped for single-processor applications is not suited to multi-core architectures. The Algorithm Architecture Matching (AAM) methodology optimizes static application implementation on multi-core architectures. The Random Access Channel Preamble Detection (RACH-PD) is an algorithm for non-synchronized access of Long Term Evolu-tion (LTE) wireless networks. LTE aims to improve the spectral efficiency of the next generation cellular system. This paper de-scribes a complete methodology for implementing the RACH-PD. AAM prototyping is applied to the RACH-PD which is modelled as a Synchronous DataFlow graph (SDF). An efficient implemen-tation of the algorithm onto a multi-core DSP, the TI C6487, is then explained. Benchmarks for the solution are given

    Optimization of the motion estimation for parallel embedded systems in the context of new video standards

    Get PDF
    15 pagesInternational audienceThe effciency of video compression methods mainly depends on the motion compensation stage, and the design of effcient motion estimation techniques is still an important issue. An highly accurate motion estimation can significantly reduce the bit-rate, but involves a high computational complexity. This is particularly true for new generations of video compression standards, MPEG AVC and HEVC, which involves techniques such as different reference frames, sub-pixel estimation, variable block sizes. In this context, the design of fast motion estimation solutions is necessary, and can concerned two linked aspects: a high quality algorithm and its effcient implementation. This paper summarizes our main contributions in this domain. In particular, we first present the HME (Hierarchical Motion Estimation) technique. It is based on a multi-level refinement process where the motion estimation vectors are first estimated on a sub-sampled image. The multi-levels decomposition provides robust predictions and is particularly suited for variable block sizes motion estimations. The HME method has been integrated in a AVC encoder, and we propose a parallel implementation of this technique, with the motion estimation at pixel level performed by a DSP processor, and the sub-pixel refinement realized in an FPGA. The second technique that we present is called HDS for Hierarchical Diamond Search. It combines the multi-level refinement of HME, with a fast search at pixel-accuracy inspired by the EPZS method. This paper also presents its parallel implementation onto a multi-DSP platform and the its use in the HEVC context

    Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

    Get PDF
    International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

    IMPLEMENTATION OF MOTION ESTIMATION BASED ON HETEROGENEOUS PARALLEL COMPUTING SYSTEM WITH OPENC

    Get PDF
    International audienceHeterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. Open Computing Language (OpenCL) is the first open, royaltyfree standard for heterogenous computing on multi hardware platforms. In this paper, we propose a parallel Motion Estimation (ME) algorithm implemented using OpenCL and present several optimization strategies applied in our OpenCL implementation of the motion estimation. In the same time, we implement the proposed algorithm on our heterogeneous computing system which contains one CPU and one GPU, and propose one method to determine the balance to distribute the workload in heterogeneous computing system with OpenCL. According to experiments, our motion estimator with achieves 100 to 150 speed-up compared with its implementation with C code executed by single CPU core and our proposed method obtains obviously enhancement of performance in based on our heterogeneous computing system

    Optimized fixed point implementation of a local stereo matching algorithm onto C66x DSP

    Get PDF
    International audienceStereo matching techniques aim at reconstructing disparity maps from a pair of images. The use of stereo matching techniques in embedded systems is very challenging due to the complexity of the state-of-the-art algorithms. An efficient local stereo matching algorithm has been chosen from the literature and implemented on a c6678 DSP. Arithmetic simplifications such as approximation by piecewise linear functions and fixed point conversions are proposed. Thanks to factorisation and pre-computing, the memory footprint is reduced by a factor 13 to fit on the memory footprint available on embedded systems. A 14.5 fps speed (factor 60 speed-up) has been reached with a small quality loss on the final disparity map

    EXPLOITING PARTIALLY RECONFIGURABLE FPGA FOR PERFORMANCE ADJUSTMENT IN THE RVC FRAMEWORK

    Get PDF
    International audienceIn this paper, we present a method to implement a specific algorithm using the RVC framework and the dynamic partial reconfiguration (DPR). The DPR is a technique allowing to replace modules in a design at run-time. While, the RVC framework is based on the use a specific language for writing dataflow models called RVC-CAL. The studied algorithm is a Hadamard transform. Several dataflow models of the Hadamard transform can be used in the design process in order to favor speed or power consumption. We show the steps required to implement and switch between two dataflow models (a sequential model and a pipelined model) of the Hadamard transform. Our design allows to user to choose one of two architectures according her requirements of low power and high speed
    • …
    corecore