784 research outputs found
Complex Library Mapping for Embedded Software Using Symbolic Algebra
Embedded software designers often use libraries that have been pre-optimized for a given processor to achieve higher code quality. However, using such libraries in legacy code optimization is nontrivial and typically requires manual intervention. This paper presents a methodology that maps algorithmic constructs of the software specification to a library of complex software elements. This library-mapping step is automated by using symbolic algebra techniques. We illustrate the advantages of our methodology by optimizing an algorithmic level description of MPEG Layer III (MP3) audio decoder for the Badge4 [2] portable embedded system. During the optimization process we use commercially available libraries with complex elements ranging from simple mathematical functions such as exp to the IDCT routine. We implemented and measured the performance and energy consumption of the MP3 decoder software on Badge4 running embedded Linux operating system. The optimized MP3 audio decoder runs 300 times faster than the original code obtained from the standards body while consuming 400 times less energy. Since our optimized MP3 decoder runs 3.5 times faster than real-time, additional energy can be saved by using processor frequency and voltage scaling
Adding Automatic Parallelization to Faust
International audienceFaust 0.9.9.5 introduces new compilation options to do automatic parallelization of code using OpenMP. This paper explains how the automatic parallelization is done and presents some benchmarks
Low Latency Prefix Accumulation Driven Compound MAC Unit for Efficient FIR Filter Implementation
135–138This article presents hierarchical single compound adder-based MAC with assertion based error correction for speculation variations in the prefix addition for FIR filter design. The VLSI implementation of approximation in prefix adder results show a significant delay and complexity reductions, all this at the cost of latency measures when speculation fails during carry propagation, which is the main reason preventing the use of speculation in parallel-prefix adders in DSP applications. The speculative adder which is based on Han Carlson parallel prefix adder structure accomplishes better reduction in latency. Introducing a structured and efficient shift-add technique and explore latency reduction by incorporating approximation in addition. The improvements made in terms of reduction in latency and merits in performance by the proposed MAC unit are showed through the synthesis done by FPGA hardware. Results show that proposed method outpaces both formerly projected MAC designs using multiplication methods for attaining high speed
Vesyla-II: An Algorithm Library Development Tool for Synchoros VLSI Design Style
High-level synthesis (HLS) has been researched for decades and is still
limited to fast FPGA prototyping and algorithmic RTL generation. A feasible
end-to-end system-level synthesis solution has never been rigorously proven.
Modularity and composability are the keys to enabling such a system-level
synthesis framework that bridges the huge gap between system-level
specification and physical level design. It implies that 1) modules in each
abstraction level should be physically composable without any irregular glue
logic involved and 2) the cost of each module in each abstraction level is
accurately predictable. The ultimate reasons that limit how far the
conventional HLS can go are precisely that it cannot generate modular designs
that are physically composable and cannot accurately predict the cost of its
design. In this paper, we propose Vesyla, not as yet another HLS tool, but as a
synthesis tool that positions itself in a promising end-to-end synthesis
framework and preserving its ability to generate physically composable modular
design and to accurately predict its cost metrics. We present in the paper how
Vesyla is constructed focusing on the novel platform it targets and the
internal data structures that highlights the uniqueness of Vesyla. We also show
how Vesyla will be positioned in the end-to-end synchoros synthesis framework
called SiLago
Advances in Bit Width Selection Methodology
We describe a method for the formal determination of signal bit width in fixed points VLSI implementations of signal processing algorithms containin- g loop nests. The main advance of this paper lies in the fact that we use results of the (max,+) algebraic theory to find the integral bit width of algorithms containing loop nests whose bound parameters are not statically known. Combined with recent results on fractional bit width determination, the results of this paper can be used for 1-dimensional systolic-like arrays implementing linear signal processing algorithms. Although they are presented in the context of a specific high level design methodology (based on systems of affine recurrence equations), the results of this work can be used in many high level design environments
- …