3,316 research outputs found
An Adaptive Design Methodology for Reduction of Product Development Risk
Embedded systems interaction with environment inherently complicates
understanding of requirements and their correct implementation. However,
product uncertainty is highest during early stages of development. Design
verification is an essential step in the development of any system, especially
for Embedded System. This paper introduces a novel adaptive design methodology,
which incorporates step-wise prototyping and verification. With each adaptive
step product-realization level is enhanced while decreasing the level of
product uncertainty, thereby reducing the overall costs. The back-bone of this
frame-work is the development of Domain Specific Operational (DOP) Model and
the associated Verification Instrumentation for Test and Evaluation, developed
based on the DOP model. Together they generate functionally valid test-sequence
for carrying out prototype evaluation. With the help of a case study 'Multimode
Detection Subsystem' the application of this method is sketched. The design
methodologies can be compared by defining and computing a generic performance
criterion like Average design-cycle Risk. For the case study, by computing
Average design-cycle Risk, it is shown that the adaptive method reduces the
product development risk for a small increase in the total design cycle time.Comment: 21 pages, 9 figure
A VLSI DSP DESIGN AND IMPLEMENTATION OF COMB FILTER USING UN-FOLDING METHODOLOGY
In signal processing, a comb filter adds a delayed version of a signal to itself, causing constructive and destructive interference. Comb filters are used in a variety of signal processing applications that is Cascaded Integrator-Comb filters, Audio effects, including echo, flanging, and digital waveguide synthesis and various other applications. Comb filter when implemented has lower through-put as the sample period can not be achieved equal to the iteration bound because node computation time of comb filter is larger than the iteration bound. Hence throughput remains less. This paper present the comb filter using one of the methodology needed to design custom or semi custom VLSI circuits named as Un-Folding which increases the throughput of the comb filter. Un-Folding is a transformation technique that can be applied to a DSP program to create a new program describing more than one iteration of the original program. It can unravel hidden con-currency in digital signal processing systems described by DFGs. Therefore, unfolding has been used for the sample period reduction of the comb filter for its higher throughput
Transformations of High-Level Synthesis Codes for High-Performance Computing
Specialized hardware architectures promise a major step in performance and
energy efficiency over the traditional load/store devices currently employed in
large scale computing systems. The adoption of high-level synthesis (HLS) from
languages such as C/C++ and OpenCL has greatly increased programmer
productivity when designing for such platforms. While this has enabled a wider
audience to target specialized hardware, the optimization principles known from
traditional software design are no longer sufficient to implement
high-performance codes. Fast and efficient codes for reconfigurable platforms
are thus still challenging to design. To alleviate this, we present a set of
optimizing transformations for HLS, targeting scalable and efficient
architectures for high-performance computing (HPC) applications. Our work
provides a toolbox for developers, where we systematically identify classes of
transformations, the characteristics of their effect on the HLS code and the
resulting hardware (e.g., increases data reuse or resource consumption), and
the objectives that each transformation can target (e.g., resolve interface
contention, or increase parallelism). We show how these can be used to
efficiently exploit pipelining, on-chip distributed fast memory, and on-chip
streaming dataflow, allowing for massively parallel architectures. To quantify
the effect of our transformations, we use them to optimize a set of
throughput-oriented FPGA kernels, demonstrating that our enhancements are
sufficient to scale up parallelism within the hardware constraints. With the
transformations covered, we hope to establish a common framework for
performance engineers, compiler developers, and hardware developers, to tap
into the performance potential offered by specialized hardware architectures
using HLS
Pipelining Saturated Accumulation
Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, and FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280 MHz on a Xilinx Spartan-3(XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM
High throughput spatial convolution filters on FPGAs
Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility
- …