784 research outputs found

    Complex Library Mapping for Embedded Software Using Symbolic Algebra

    Get PDF
    Embedded software designers often use libraries that have been pre-optimized for a given processor to achieve higher code quality. However, using such libraries in legacy code optimization is nontrivial and typically requires manual intervention. This paper presents a methodology that maps algorithmic constructs of the software specification to a library of complex software elements. This library-mapping step is automated by using symbolic algebra techniques. We illustrate the advantages of our methodology by optimizing an algorithmic level description of MPEG Layer III (MP3) audio decoder for the Badge4 [2] portable embedded system. During the optimization process we use commercially available libraries with complex elements ranging from simple mathematical functions such as exp to the IDCT routine. We implemented and measured the performance and energy consumption of the MP3 decoder software on Badge4 running embedded Linux operating system. The optimized MP3 audio decoder runs 300 times faster than the original code obtained from the standards body while consuming 400 times less energy. Since our optimized MP3 decoder runs 3.5 times faster than real-time, additional energy can be saved by using processor frequency and voltage scaling

    Taylor Expansion Diagrams: A Canonical Representation for Verification of Data Flow Designs

    Full text link

    Adding Automatic Parallelization to Faust

    Get PDF
    International audienceFaust 0.9.9.5 introduces new compilation options to do automatic parallelization of code using OpenMP. This paper explains how the automatic parallelization is done and presents some benchmarks

    Low Latency Prefix Accumulation Driven Compound MAC Unit for Efficient FIR Filter Implementation

    Get PDF
    135–138This article presents hierarchical single compound adder-based MAC with assertion based error correction for speculation variations in the prefix addition for FIR filter design. The VLSI implementation of approximation in prefix adder results show a significant delay and complexity reductions, all this at the cost of latency measures when speculation fails during carry propagation, which is the main reason preventing the use of speculation in parallel-prefix adders in DSP applications. The speculative adder which is based on Han Carlson parallel prefix adder structure accomplishes better reduction in latency. Introducing a structured and efficient shift-add technique and explore latency reduction by incorporating approximation in addition. The improvements made in terms of reduction in latency and merits in performance by the proposed MAC unit are showed through the synthesis done by FPGA hardware. Results show that proposed method outpaces both formerly projected MAC designs using multiplication methods for attaining high speed

    Vesyla-II: An Algorithm Library Development Tool for Synchoros VLSI Design Style

    Full text link
    High-level synthesis (HLS) has been researched for decades and is still limited to fast FPGA prototyping and algorithmic RTL generation. A feasible end-to-end system-level synthesis solution has never been rigorously proven. Modularity and composability are the keys to enabling such a system-level synthesis framework that bridges the huge gap between system-level specification and physical level design. It implies that 1) modules in each abstraction level should be physically composable without any irregular glue logic involved and 2) the cost of each module in each abstraction level is accurately predictable. The ultimate reasons that limit how far the conventional HLS can go are precisely that it cannot generate modular designs that are physically composable and cannot accurately predict the cost of its design. In this paper, we propose Vesyla, not as yet another HLS tool, but as a synthesis tool that positions itself in a promising end-to-end synthesis framework and preserving its ability to generate physically composable modular design and to accurately predict its cost metrics. We present in the paper how Vesyla is constructed focusing on the novel platform it targets and the internal data structures that highlights the uniqueness of Vesyla. We also show how Vesyla will be positioned in the end-to-end synchoros synthesis framework called SiLago

    Advances in Bit Width Selection Methodology

    Get PDF
    We describe a method for the formal determination of signal bit width in fixed points VLSI implementations of signal processing algorithms containin- g loop nests. The main advance of this paper lies in the fact that we use results of the (max,+) algebraic theory to find the integral bit width of algorithms containing loop nests whose bound parameters are not statically known. Combined with recent results on fractional bit width determination, the results of this paper can be used for 1-dimensional systolic-like arrays implementing linear signal processing algorithms. Although they are presented in the context of a specific high level design methodology (based on systems of affine recurrence equations), the results of this work can be used in many high level design environments
    corecore