thesis

Exploration of the scalability of SIMD processing for software defined radio

Abstract

The idea of software defined radio (SDR) describes a signal processing system for wireless communications that allows performing major parts of the physical layer processing in software. SDR systems are more flexible and have lower development costs than traditional systems based on application-specific integrated circuits (ASICs). Yet, SDR requires programmable processor architectures that can meet the throughput and energy efficiency requirements of current third generation (3G) and future fourth generation (4G) wireless standards for mobile devices. Single instruction, multiple data (SIMD) processors operate on long data vectors in parallel data lanes and can achieve a good ratio of computing power to energy consumption. Hence, SIMD processors could be the basis of future SDR systems. Yet, SIMD processors only achieve a high efficiency if all parallel data lanes can be utilized. This thesis investigates the scalability of SIMD processing for algorithms required in 4G wireless systems; i. e. the scaling of performance and energy consumption with increasing SIMD vector lengths is explored. The basis of the exploration is a scalable SIMD processor architecture, which also supports long instruction word (LIW) execution and can be configured with four different permutation networks for vector element permutations. Radix-2 and mixed-radix fast Fourier transform (FFT) algorithms, sphere decoding for multiple input, multiple output (MIMO) systems, and the decoding of quasi-cyclic lowdensity parity check (LDPC) codes have been examined, as these are key algorithms for 4G wireless systems. The results show that the performance of all algorithms scales with the SIMD vector length, yet there are different constraints on the ratios between algorithm and architecture parameters. The radix-2 FFT algorithm allows close to linear speedups if the FFT size is at least twice the SIMD vector length, the mixed-radix FFT algorithm requires the FFT size to be a multiple of the squared SIMD width. The performance of the implemented sphere decoding algorithm scales linearly with the SIMD vector length. The scalability of LDPC decoding is determined by the expansion factor of the quasicyclic code. Wider SIMD processors offer better performance and also require less energy than processors with a shorter vector length for all considered algorithms. The results for different permutations networks show that a simple permutation network is sufficient for most applications

    Similar works