188 research outputs found
Controller synthesis for application specific digital signal processors
A controller synthesizer, that is part of a design system by which algorithms unsuitable for standard processors can be implemented, is presented. A hierarchical controller architecture suitable for frame-based and multi-sample-rate algorithms is synthesized. Synthesis of a controller is based on micro instructions, specific for each architecture, and assumes no use of predefined functional blocks. The designer can affect complexity and partitioning of the controller by changing the micro program. Processors for speech scrambling and digital adjustment of quadrature modulators have been designed and fabricated
Architectures for Dynamic Data Scaling in 2/4/8K Pipeline FFT Cores
This paper presents architectures for supporting dynamic data scaling in pipeline fast Fourier transforms (FFTs), suitable when implementing large size FFTs in applications such as digital video broadcasting and digital holographic imaging. In a pipeline FFT, data is continuously streaming and must, hence, be scaled without stalling the dataflow. We propose a hybrid floating-point scheme with tailored exponent datapath, and a co-optimized architecture between hybrid floating point and block floating point (BFP) to reduce memory requirements for 2-D signal processing. The presented co-optimization generates a higher signal-to-quantization-noise ratio and requires less memory than for instance convergent BFP. A 2048-point pipeline FFT has been fabricated in a standard-CMOS process from AMI Semiconductor (Lenart and Öwall, 2003), and a field-programmable gate array prototype integrating a 2-D FFT core in a larger design shows that the architecture is suitable for image reconstruction in digital holographic imaging
Modeling and exploration of a reconfigurable architecture for digital holographic imaging
The use of coarse-grain reconfigurable architectures (CGRA) is a suitable alternative for hardware acceleration in many application areas, including digital holographic imaging. In this paper, we propose a CGRA-based system with an array of processing and memory cells, which communicate using a local and a global communication network, and a stream memory controller to manage data transfers to external memory. We present our SystemC-based exploration environment (SCENIC) and methodology used to construct and evaluate systems containing reconfigurable architectures. A case study illustrates the advantages with rapid system level exploration to find and solve bottlenecks in complex designs prior to RTL description
A parallel 2Gops/s image convolution processor with low I/O bandwidth
A customized image processor for real time convolution of an image has been developed. Image convolution requires an extensive amount of calculation capacity and I/O communication which is hard to sustain with standard processors in real time. Therefore, a customized processor has been designed with a tailored architecture. The processors have a total sustained calculation capacity of >2G arithmetic operations/s at 20 MHz clock frequency, surpassing that of TMS320C80 for this application due to the tailored architecture
A simplified computational kernel for trellis-based decoding
A simplified branch metric and add-compare-select (ACS) unit is presented for use in trellis-based decoding architectures. The simplification is based on a complementary property of best feedforward and some systematic feedback encoders. As a result, one adder is saved in every other ACS unit. Furthermore, only half the branch metrics have to be calculated. It is shown that this simplification becomes especially beneficial for rate 1/2 convolutional codes. Consequently, area and power consumption will be reduced in a hardware implementation
Implementation Issues for acoustic echo cancellers
The high computational complexity of acoustic echo cancellation algorithms requires application specific implementations to sustain real time signal processing with affordable power consumption. This is especially true for systems where a delayless approach is considered important, e.g. wireless communication systems. The proposed paper presents architectural considerations to reach a feasible hardware solution
A hybrid interconnect network-on-chip and a transaction level modeling approach for reconfigurable computing
This paper presents a hybrid interconnect network consisting of a local network with dedicated wires and a global hierarchical network. A distributed memory approach enables the possibility to use generic memory banks as routing buffers, simplifies the implementation and reduces the area requirements of routers. A SystemC simulation environment (SCENIC) has been developed to simulate and instrument models, and to setup different topologies and scenarios. Modules are designed as transaction level models to improve design time and simulation speed
Optimization and implementation of a Viterbi decoder under flexibility constraints
This paper discusses the impact of flexibility when designing a Viterbi decoder for both convolutional and TCM codes. Different trade-offs have to be considered in choosing the right architecture for the processing blocks and the resulting hardware penalty is evaluated. We study the impact of symbol quantization that degrades performance and affects the wordlength of the rate-flexible trellis datapath. A radix-2-based architecture for this datapath relaxes the hardware requirements on the branch metric and survivor path blocks substantially. The cost of flexibility in terms of cell area and power consumption is explored by an investigation of synthesized designs that provide different transmission rates. Two designs are fabricated in a digital 0.13- CMOS process. Based on post-layout simulations, a symbol baud rate of 168 Mbaud/s is achieved in TCM mode, equivalent to a maximum throughput of 840 Mbit/s using a 64-QAM constellation
Area and power efficient trellis computational blocks in 0.13μm CMOS
Improved add-compare-select and branch metric units are presented to reduce the complexity in the implementation of trellis-based decoding architectures. These units use a complementary property of the best rate 1/2 convolutional codes to reduce both area requirements and power consumption in a silicon implementation with no loss in decoding performance. For a 0.13μm CMOS process, synthesized computational blocks for decoders that can process codes from memory 2 up to 7 show up to 17% savings in both cell area and power consumptio
Generalized lock-in amplifier for precision measurement of high frequency signals
We herein formulate the concept of a generalized lock-in amplifier for the
precision measurement of high frequency signals based on digital cavities.
Accurate measurement of signals higher than 200 MHz using the generalized
lock-in is demonstrated. The technique is compared with a traditional lock-in
and its advantages and limitations are discussed. We also briefly point out how
the generalized lock-in can be used for precision measurement of giga-hertz
signals by using parallel processing of the digitized signals
- …