44 research outputs found

    Advanced Wireless Digital Baseband Signal Processing Beyond 100 Gbit/s

    Get PDF
    International audienceThe continuing trend towards higher data rates in wireless communication systems will, in addition to a higher spectral efficiency and lowest signal processing latencies, lead to throughput requirements for the digital baseband signal processing beyond 100 Gbit/s, which is at least one order of magnitude higher than the tens of Gbit/s targeted in the 5G standardization. At the same time, advances in silicon technology due to shrinking feature sizes and increased performance parameters alone won't provide the necessary gain, especially in energy efficiency for wireless transceivers, which have tightly constrained power and energy budgets. In this paper, we highlight the challenges for wireless digital baseband signal processing beyond 100 Gbit/s and the limitations of today's architectures. Our focus lies on the channel decoding and MIMO detection, which are major sources of complexity in digital baseband signal processing. We discuss techniques on algorithmic and architectural level, which aim to close this gap. For the first time we show Turbo-Code decoding techniques towards 100 Gbit/s and a complete MIMO receiver beyond 100 Gbit/s in 28 nm technology

    Adaptive Baseband Pro cessing and Configurable Hardware for Wireless Communication

    Get PDF
    The world of information is literally at one’s fingertips, allowing access to previously unimaginable amounts of data, thanks to advances in wireless communication. The growing demand for high speed data has necessitated theuse of wider bandwidths, and wireless technologies such as Multiple-InputMultiple-Output (MIMO) have been adopted to increase spectral efficiency.These advanced communication technologies require sophisticated signal processing, often leading to higher power consumption and reduced battery life.Therefore, increasing energy efficiency of baseband hardware for MIMO signal processing has become extremely vital. High Quality of Service (QoS)requirements invariably lead to a larger number of computations and a higherpower dissipation. However, recognizing the dynamic nature of the wirelesscommunication medium in which only some channel scenarios require complexsignal processing, and that not all situations call for high data rates, allowsthe use of an adaptive channel aware signal processing strategy to provide adesired QoS. Information such as interference conditions, coherence bandwidthand Signal to Noise Ratio (SNR) can be used to reduce algorithmic computations in favorable channels. Hardware circuits which run these algorithmsneed flexibility and easy reconfigurability to switch between multiple designsfor different parameters. These parameters can be used to tune the operations of different components in a receiver based on feedback from the digitalbaseband. This dissertation focuses on the optimization of digital basebandcircuitry of receivers which use feedback to trade power and performance. Aco-optimization approach, where designs are optimized starting from the algorithmic stage through the hardware architectural stage to the final circuitimplementation is adopted to realize energy efficient digital baseband hardwarefor mobile 4G devices. These concepts are also extended to the next generation5G systems where the energy efficiency of the base station is improved.This work includes six papers that examine digital circuits in MIMO wireless receivers. Several key blocks in these receiver include analog circuits thathave residual non-linearities, leading to signal intermodulation and distortion.Paper-I introduces a digital technique to detect such non-linearities and calibrate analog circuits to improve signal quality. The concept of a digital nonlinearity tuning system developed in Paper-I is implemented and demonstratedin hardware. The performance of this implementation is tested with an analogchannel select filter, and results are presented in Paper-II. MIMO systems suchas the ones used in 4G, may employ QR Decomposition (QRD) processors tosimplify the implementation of tree search based signal detectors. However,the small form factor of the mobile device increases spatial correlation, whichis detrimental to signal multiplexing. Consequently, a QRD processor capableof handling high spatial correlation is presented in Paper-III. The algorithm and hardware implementation are optimized for carrier aggregation, which increases requirements on signal processing throughput, leading to higher powerdissipation. Paper-IV presents a method to perform channel-aware processingwith a simple interpolation strategy to adaptively reduce QRD computationcount. Channel properties such as coherence bandwidth and SNR are used toreduce multiplications by 40% to 80%. These concepts are extended to usetime domain correlation properties, and a full QRD processor for 4G systemsfabricated in 28 nm FD-SOI technology is presented in Paper-V. The designis implemented with a configurable architecture and measurements show thatcircuit tuning results in a highly energy efficient processor, requiring 0.2 nJ to1.3 nJ for each QRD. Finally, these adaptive channel-aware signal processingconcepts are examined in the scope of the next generation of communicationsystems. Massive MIMO systems increase spectral efficiency by using a largenumber of antennas at the base station. Consequently, the signal processingat the base station has a high computational count. Paper-VI presents a configurable detection scheme which reduces this complexity by using techniquessuch as selective user detection and interpolation based signal processing. Hardware is optimized for resource sharing, resulting in a highly reconfigurable andenergy efficient uplink signal detector

    Energy Efficient VLSI Circuits for MIMO-WLAN

    Get PDF
    Mobile communication - anytime, anywhere access to data and communication services - has been continuously increasing since the operation of the first wireless communication link by Guglielmo Marconi. The demand for higher data rates, despite the limited bandwidth, led to the development of multiple-input multiple-output (MIMO) communication which is often combined with orthogonal frequency division multiplexing (OFDM). Together, these two techniques achieve a high bandwidth efficiency. Unfortunately, techniques such as MIMO-OFDM significantly increase the signal processing complexity of transceivers. While fast improvements in the integrated circuit (IC) technology enabled to implement more signal processing complexity per chip, large efforts had and have to be done for novel algorithms as well as for efficient very large scaled integration (VLSI) architectures in order to meet today's and tomorrow's requirements for mobile wireless communication systems. In this thesis, we will present architectures and VLSI implementations of complete physical (PHY) layer application specific integrated circuits (ASICs) under the constraints imposed by an industrial wireless communication standard. Contrary to many other publications, we do not elaborate individual components of a MIMO-OFDM communication system stand-alone, but in the context of the complete PHY layer ASIC. We will investigate the performance of several MIMO detectors and the corresponding preprocessing circuits, being integrated into the entire PHY layer ASIC, in terms of achievable error-rate, power consumption, and area requirement. Finally, we will assemble the results from the proposed PHY layer implementations in order to enhance the energy efficiency of a transceiver. To this end, we propose a cross-layer optimization of PHY layer and medium access control (MAC) layer

    An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems

    Full text link
    The use of many-core processors such as general purpose Graphic Processing Units (GPUs) has recently become attractive for the efficient implementation of signal processing algorithms for communication systems. This is due to the cost-effectiveness of GPUs together with their potential capability of parallel processing. This paper presents an implementation of the widely employed fixed-complexity sphere decoder on GPUs, which allows to considerably decrease the computational time required for the data detection stage in multiple-input multiple-output systems. Both, the hard-and soft-output versions of the method have been implemented. Speedup results show the proposed GPU implementation boosts the runtime of the parallel execution of the methods in a high performance multi-core CPU. In addition, the throughput of the algorithm is evaluated and is shown to outperform other recent implementations and to fulfill the real-time requirements of several LTE configurations. ©2012-IOS Press and the authors. All rights reserved.This work was partially funded by the TEC2009-13741 project of the Spanish Ministry of Science and by the PROMETEO/2009/013 project of the Generalitat Valenciana.Roger Varea, S.; Ramiro Sánchez, C.; González Salvador, A.; Almenar Terré, V.; Vidal Maciá, AM. (2012). An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems. Integrated Computer-Aided Engineering. 19(4):341-350. https://doi.org/10.3233/ICA-2012-0410S34135019

    Datacenter Design for Future Cloud Radio Access Network.

    Full text link
    Cloud radio access network (C-RAN), an emerging cloud service that combines the traditional radio access network (RAN) with cloud computing technology, has been proposed as a solution to handle the growing energy consumption and cost of the traditional RAN. Through aggregating baseband units (BBUs) in a centralized cloud datacenter, C-RAN reduces energy and cost, and improves wireless throughput and quality of service. However, designing a datacenter for C-RAN has not yet been studied. In this dissertation, I investigate how a datacenter for C-RAN BBUs should be built on commodity servers. I first design WiBench, an open-source benchmark suite containing the key signal processing kernels of many mainstream wireless protocols, and study its characteristics. The characterization study shows that there is abundant data level parallelism (DLP) and thread level parallelism (TLP). Based on this result, I then develop high performance software implementations of C-RAN BBU kernels in C++ and CUDA for both CPUs and GPUs. In addition, I generalize the GPU parallelization techniques of the Turbo decoder to the trellis algorithms, an important family of algorithms that are widely used in data compression and channel coding. Then I evaluate the performance of commodity CPU servers and GPU servers. The study shows that the datacenter with GPU servers can meet the LTE standard throughput with 4× to 16× fewer machines than with CPU servers. A further energy and cost analysis show that GPU servers can save on average 13× more energy and 6× more cost. Thus, I propose the C-RAN datacenter be built using GPUs as a server platform. Next I study resource management techniques to handle the temporal and spatial traffic imbalance in a C-RAN datacenter. I propose a “hill-climbing” power management that combines powering-off GPUs and DVFS to match the temporal C-RAN traffic pattern. Under a practical traffic model, this technique saves 40% of the BBU energy in a GPU-based C-RAN datacenter. For spatial traffic imbalance, I propose three workload distribution techniques to improve load balance and throughput. Among all three techniques, pipelining packets has the most throughput improvement at 10% and 16% for balanced and unbalanced loads, respectively.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120825/1/qizheng_1.pd

    Energy-Efficient Digital Signal Processing Hardware Design.

    Full text link
    As CMOS technology has developed considerably in the last few decades, many SoCs have been implemented across different application areas due to reduced area and power consumption. Digital signal processing (DSP) algorithms are frequently employed in these systems to achieve more accurate operation or faster computation. However, CMOS technology scaling started to slow down recently and relatively large systems consume too much power to rely only on the scaling effect while system power budget such as battery capacity improves slowly. In addition, there exist increasing needs for miniaturized computing systems including sensor nodes that can accomplish similar operations with significantly smaller power budget. Voltage scaling is one of the most promising power saving techniques due to quadratic switching power reduction effect, making it necessary feature for even high-end processors. However, in order to achieve maximum possible energy efficiency, systems should operate in near or sub-threshold regimes where leakage takes significant portion of power. In this dissertation, a few key energy-aware design approaches are described. Considering prominent leakage and larger PVT variability in low operating voltages, multi-level energy saving techniques to be described are applied to key building blocks in DSP applications: architecture study, algorithm-architecture co-optimization, and robust yet low-power memory design. Finally, described approaches are applied to design examples including a visual navigation accelerator, ultra-low power biomedical SoC and face detection/recognition processor, resulting in 2~100 times power savings than state-of-the-art.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110496/1/djeon_1.pd

    Towards Hybrid Classical-Quantum Computation Structures in Wirelessly-Networked Systems

    Full text link
    With unprecedented increases in traffic load in today's wireless networks, design challenges shift from the wireless network itself to the computational support behind the wireless network. In this vein, there is new interest in quantum-compute approaches because of their potential to substantially speed up processing, and so improve network throughput. However, quantum hardware that actually exists today is much more susceptible to computational errors than silicon-based hardware, due to the physical phenomena of decoherence and noise. This paper explores the boundary between the two types of computation---classical-quantum hybrid processing for optimization problems in wireless systems---envisioning how wireless can simultaneously leverage the benefit of both approaches. We explore the feasibility of a hybrid system with a real hardware prototype using one of the most advanced experimentally available techniques today, reverse quantum annealing. Preliminary results on a low-latency, large MIMO system envisioned in the 5G New Radio roadmap are encouraging, showing approximately 2--10X better performance in terms of processing time than prior published results.Comment: HotNets 2020: Nineteenth ACM Workshop on Hot Topics in Networks (https://doi.org/10.1145/3422604.3425924

    The First 15 Years of SEFDM: A Brief Survey

    Get PDF
    Spectrally efficient frequency division multiplexing (SEFDM) is a multi-carrier signal waveform, which achieves higher spectral efficiency, relative to conventional orthogonal frequency division multiplexing (OFDM), by violating the orthogonality of its sub-carriers. This survey provides the history of SEFDM development since its inception in 2003, covering fundamentals and concepts, wireless and optical communications applications, circuit design and experimental testbeds. We focus on work done at UCL and outline work done other universities and research laboratories worldwide. We outline techniques to improve the performance of SEFDM and its practical utility with focus on signal generation, detection and channel estimation
    corecore