36 research outputs found

    High-Throughput Contention-Free Concurrent Interleaver Architecture for Multi-Standard Turbo Decoder

    Get PDF
    To meet the higher data rate requirement of emerging wireless communication technology, numerous parallel turbo decoder architectures have been developed. However, the interleaver has become a major bottleneck that limits the achievable throughput in the parallel decoders due to the massive memory conflicts. In this paper, we propose a flexible Double-Buffer based Contention-Free (DBCF) interleaver architecture that can efficiently solve the memory conflict problem for parallel turbo decoders with very high parallelism. The proposed DBCF architecture enables high throughput concurrent interleaving for multi-standard turbo decoders that support UMTS/HSPA+, LTE and WiMAX, with small datapath delays and low hardware cost. We implemented the DBCF interleaver with a 65nm CMOS technology. The implementation of this highly efficient DBCF interleaver architecture shows significant improvement in terms of the maximum throughput and occupied chip area compared to the previous work.HuaweiNational Science Foundatio

    Beyond Gbps Turbo Decoder on Multi-Core CPUs

    Get PDF
    International audienceThis paper presents a high-throughput implementation of a portable software turbo decoder. The code is optimized for traditional multi-core CPUs (like x86) and it is based on the Enhanced max-log-MAP turbo decoding variant. The code follows the LTE-Advanced specification. The key of the high performance comes from an inter-frame SIMD strategy combined with a fixed-point representation. Our results show that proposed multi-core CPU implementation of turbo-decoders is a challenging alternative to GPU implementation in terms of throughput and energy efficiency. On a high-end processor, our software turbo-decoder exceeds 1 Gbps information throughput for all rate-1/3 LTE codes with K < 4096

    Datacenter Design for Future Cloud Radio Access Network.

    Full text link
    Cloud radio access network (C-RAN), an emerging cloud service that combines the traditional radio access network (RAN) with cloud computing technology, has been proposed as a solution to handle the growing energy consumption and cost of the traditional RAN. Through aggregating baseband units (BBUs) in a centralized cloud datacenter, C-RAN reduces energy and cost, and improves wireless throughput and quality of service. However, designing a datacenter for C-RAN has not yet been studied. In this dissertation, I investigate how a datacenter for C-RAN BBUs should be built on commodity servers. I first design WiBench, an open-source benchmark suite containing the key signal processing kernels of many mainstream wireless protocols, and study its characteristics. The characterization study shows that there is abundant data level parallelism (DLP) and thread level parallelism (TLP). Based on this result, I then develop high performance software implementations of C-RAN BBU kernels in C++ and CUDA for both CPUs and GPUs. In addition, I generalize the GPU parallelization techniques of the Turbo decoder to the trellis algorithms, an important family of algorithms that are widely used in data compression and channel coding. Then I evaluate the performance of commodity CPU servers and GPU servers. The study shows that the datacenter with GPU servers can meet the LTE standard throughput with 4× to 16× fewer machines than with CPU servers. A further energy and cost analysis show that GPU servers can save on average 13× more energy and 6× more cost. Thus, I propose the C-RAN datacenter be built using GPUs as a server platform. Next I study resource management techniques to handle the temporal and spatial traffic imbalance in a C-RAN datacenter. I propose a “hill-climbing” power management that combines powering-off GPUs and DVFS to match the temporal C-RAN traffic pattern. Under a practical traffic model, this technique saves 40% of the BBU energy in a GPU-based C-RAN datacenter. For spatial traffic imbalance, I propose three workload distribution techniques to improve load balance and throughput. Among all three techniques, pipelining packets has the most throughput improvement at 10% and 16% for balanced and unbalanced loads, respectively.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120825/1/qizheng_1.pd

    Implementation of a fully-parallel turbo decoder on a general-purpose graphics processing unit

    No full text
    Turbo codes comprising a parallel concatenation of upper and lower convolutional codes are widely employed in state-of-the-art wireless communication standards, since they facilitate transmission throughputs that closely approach the channel capacity. However, this necessitates high processing throughputs in order for the turbo code to support real-time communications. In stateof- the-art turbo code implementations, the processing throughput is typically limited by the data dependencies that occur within the forward and backward recursions of the Log-BCJR algorithm, which is employed during turbo decoding. In contrast to the highly-serial Log-BCJR turbo decoder, we have recently proposed a novel Fully Parallel Turbo Decoder (FPTD) algorithm, which can eliminate the data dependencies and perform fully parallel processing. In this paper, we propose an optimized FPTD algorithm, which reformulates the operation of the FPTD algorithm so that the upper and lower decoders have identical operation, in order to support Single Instruction Multiple Data (SIMD) operation. This allows us to develop a novel General Purpose Graphics Processing Unit (GPGPU) implementation of the FPTD, which has application in Software-Defined Radios (SDRs) and virtualized Cloud- Radio Access Networks (C-RANs). As a benefit of its higher degree of parallelism, we show that our FPTD improves the higher processing throughput of the Log-BCJR turbo decoder by between 2.3 and 9.2 times, when employing a high-specification GPGPU. However, this is achieved at the cost of a moderate increase of the overall complexity by between 1.7 and 3.3 times

    State of the art baseband DSP platforms for Software Defined Radio: A survey

    Get PDF
    Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe

    Energy Consumption Analysis of Software Polar Decoders on Low Power Processors

    Get PDF
    International audienceThis paper presents a new dynamic and fully generic implementation of a Successive Cancellation (SC) decoder (multi-precision support and intra-/inter-frame strategy support). This fully generic SC decoder is used to perform comparisons of the different configurations in terms of throughput, latency and energy consumption. A special emphasis is given on the energy consumption on low power embedded processors for software defined radio (SDR) systems. A N=4096 code length, rate 1/2 software SC decoder consumes only 14 nJ per bit on an ARM Cortex-A57 core, while achieving 65 Mbps. Some design guidelines are given in order to adapt the configuration to the application context

    A Programmable, Scalable-Throughput Interleaver

    Get PDF
    The interleaver stages of digital communication standards show a surprisingly large variation in throughput, state sizes, and permutation functions. Furthermore, data rates for 4G standards such as LTE-Advanced will exceed typical baseband clock frequencies of handheld devices. Multistream operation for Software Defined Radio and iterative decoding algorithms will call for ever higher interleave data rates. Our interleave machine is built around 8 single-port SRAM banks and can be programmed to generate up to 8 addresses every clock cycle. The scalable architecture combines SIMD and VLIW concepts with an efficient resolution of bank conflicts. A wide range of cellular, connectivity, and broadcast interleavers have been mapped on this machine, with throughputs up to more than 0.5&#x2009;Gsymbol/second. Although it was designed for channel interleaving, the application domain of the interleaver extends also to Turbo interleaving. The presented configuration of the architecture is designed as a part of a programmable outer receiver on a prototype board. It offers (near) universal programmability to enable the implementation of new interleavers. The interleaver measures 2.09&#x2009;mm2 in 65&#x2009;nm CMOS (including memories) and proves functional on silicon

    System capacity enhancement for 5G network and beyond

    Get PDF
    A thesis submitted to the University of Bedfordshire, in fulfilment of the requirements for the degree of Doctor of PhilosophyThe demand for wireless digital data is dramatically increasing year over year. Wireless communication systems like Laptops, Smart phones, Tablets, Smart watch, Virtual Reality devices and so on are becoming an important part of people’s daily life. The number of mobile devices is increasing at a very fast speed as well as the requirements for mobile devices such as super high-resolution image/video, fast download speed, very short latency and high reliability, which raise challenges to the existing wireless communication networks. Unlike the previous four generation communication networks, the fifth-generation (5G) wireless communication network includes many technologies such as millimetre-wave communication, massive multiple-input multiple-output (MIMO), visual light communication (VLC), heterogeneous network (HetNet) and so forth. Although 5G has not been standardised yet, these above technologies have been studied in both academia and industry and the goal of the research is to enhance and improve the system capacity for 5G networks and beyond by studying some key problems and providing some effective solutions existing in the above technologies from system implementation and hardware impairments’ perspective. The key problems studied in this thesis include interference cancellation in HetNet, impairments calibration for massive MIMO, channel state estimation for VLC, and low latency parallel Turbo decoding technique. Firstly, inter-cell interference in HetNet is studied and a cell specific reference signal (CRS) interference cancellation method is proposed to mitigate the performance degrade in enhanced inter-cell interference coordination (eICIC). This method takes carrier frequency offset (CFO) and timing offset (TO) of the user’s received signal into account. By reconstructing the interfering signal and cancelling it afterwards, the capacity of HetNet is enhanced. Secondly, for massive MIMO systems, the radio frequency (RF) impairments of the hardware will degrade the beamforming performance. When operated in time duplex division (TDD) mode, a massive MIMO system relies on the reciprocity of the channel which can be broken by the transmitter and receiver RF impairments. Impairments calibration has been studied and a closed-loop reciprocity calibration method is proposed in this thesis. A test device (TD) is introduced in this calibration method that can estimate the transmitters’ impairments over-the-air and feed the results back to the base station via the Internet. The uplink pilots sent by the TD can assist the BS receivers’ impairment estimation. With both the uplink and downlink impairments estimates, the reciprocity calibration coefficients can be obtained. By computer simulation and lab experiment, the performance of the proposed method is evaluated. Channel coding is an essential part of a wireless communication system which helps fight with noise and get correct information delivery. Turbo codes is one of the most reliable codes that has been used in many standards such as WiMAX and LTE. However, the decoding process of turbo codes is time-consuming and the decoding latency should be improved to meet the requirement of the future network. A reverse interleave address generator is proposed that can reduce the decoding time and a low latency parallel turbo decoder has been implemented on a FPGA platform. The simulation and experiment results prove the effectiveness of the address generator and show that there is a trade-off between latency and throughput with a limited hardware resource. Apart from the above contributions, this thesis also investigated multi-user precoding for MIMO VLC systems. As a green and secure technology, VLC is achieving more and more attention and could become a part of 5G network especially for indoor communication. For indoor scenario, the MIMO VLC channel could be easily ill-conditioned. Hence, it is important to study the impact of the channel state to the precoding performance. A channel state estimation method is proposed based on the signal to interference noise ratio (SINR) of the users’ received signal. Simulation results show that it can enhance the capacity of the indoor MIMO VLC system
    corecore