2,710 research outputs found
The Chameleon Architecture for Streaming DSP Applications
We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool
Programmable rate modem utilizing digital signal processing techniques
The engineering development study to follow was written to address the need for a Programmable Rate Digital Satellite Modem capable of supporting both burst and continuous transmission modes with either binary phase shift keying (BPSK) or quadrature phase shift keying (QPSK) modulation. The preferred implementation technique is an all digital one which utilizes as much digital signal processing (DSP) as possible. Here design tradeoffs in each portion of the modulator and demodulator subsystem are outlined, and viable circuit approaches which are easily repeatable, have low implementation losses and have low production costs are identified. The research involved for this study was divided into nine technical papers, each addressing a significant region of concern in a variable rate modem design. Trivial portions and basic support logic designs surrounding the nine major modem blocks were omitted. In brief, the nine topic areas were: (1) Transmit Data Filtering; (2) Transmit Clock Generation; (3) Carrier Synthesizer; (4) Receive AGC; (5) Receive Data Filtering; (6) RF Oscillator Phase Noise; (7) Receive Carrier Selectivity; (8) Carrier Recovery; and (9) Timing Recovery
The design and multiplier-less realization of software radio receivers with reduced system delay
This paper studies the design and multiplier-less realization of a new software radio receiver (SRR) with reduced system delay. It employs low-delay finite-impulse response (FIR) and digital allpass filters to effectively reduce the system delay of the multistage decimators in SRRs. The optimal least-square and minimax designs of these low-delay FIR and allpass-based filters are formulated as a semidefinite programming (SDP) problem, which allows zero magnitude constraint at ω = π to be incorporated readily as additional linear matrix inequalities (LMIs). By implementing the sampling rate converter (SRC) using a variable digital filter (VDF) immediately after the integer decimators, the needs for an expensive programmable FIR filter in the traditional SRR is avoided. A new method for the optimal minimax design of this VDF-based SRC using SDP is also proposed and compared with traditional weight least squares method. Other implementation issues including the multiplier-less and digital signal processor (DSP) realizations of the SRR and the generation of the clock signal in the SRC are also studied. Design results show that the system delay and implementation complexities (especially in terms of high-speed variable multipliers) of the proposed architecture are considerably reduced as compared with conventional approaches. © 2004 IEEE.published_or_final_versio
Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems
The generic matrix multiply (GEMM) function is the core element of
high-performance linear algebra libraries used in many
computationally-demanding digital signal processing (DSP) systems. We propose
an acceleration technique for GEMM based on dynamically adjusting the
imprecision (distortion) of computation. Our technique employs adaptive scalar
companding and rounding to input matrix blocks followed by two forms of packing
in floating-point that allow for concurrent calculation of multiple results.
Since the adaptive companding process controls the increase of concurrency (via
packing), the increase in processing throughput (and the corresponding increase
in distortion) depends on the input data statistics. To demonstrate this, we
derive the optimal throughput-distortion control framework for GEMM for the
broad class of zero-mean, independent identically distributed, input sources.
Our approach converts matrix multiplication in programmable processors into a
computation channel: when increasing the processing throughput, the output
noise (error) increases due to (i) coarser quantization and (ii) computational
errors caused by exceeding the machine-precision limitations. We show that,
under certain distortion in the GEMM computation, the proposed framework can
significantly surpass 100% of the peak performance of a given processor. The
practical benefits of our proposal are shown in a face recognition system and a
multi-layer perceptron system trained for metadata learning from a large music
feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012
Significance of Logic Synthesis in FPGA-Based Design of Image and Signal Processing Systems
This chapter, taking FIR filters as an example, presents the discussion on efficiency of different implementation methodologies of DSP algorithms targeting modern FPGA architectures. Nowadays, programmable technology provides the possibility to implement digital systems with the use of specialized embedded DSP blocks. However, this technology gives the designer the possibility to increase efficiency of designed systems by exploitation of parallelisms of implemented algorithms. Moreover, it is possible to apply special techniques, such as distributed arithmetic (DA). Since in this approach, general-purpose multipliers are replaced by combinational LUT blocks, it is possible to construct digital filters of very high performance. Additionally, application of the functional decomposition-based method to LUT blocks optimization, and mapping has been investigated. The chapter presents results of the comparison of various design approaches in these areas
- …