424 research outputs found

    Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design

    Get PDF
    In recent years the number of connected devices and the demand for high data-rates have been significantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the benefits to fulfill these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efficiencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased significantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efficient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can significantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application specific integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes

    Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

    Full text link
    Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency. Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.Comment: submitted to IEEE transactions on signal processin

    Analysis of Bandwidth and Latency Constraints on a Packetized Cloud Radio Access Network Fronthaul

    Get PDF
    Cloud radio access network (C-RAN) is a promising architecture for the next-generation RAN to meet the diverse and stringent requirements envisioned by fifth generation mobile communication systems (5G) and future generation mobile networks. C-RAN offers several advantages, such as reduced capital expenditure (CAPEX) and operational expenditure (OPEX), increased spectral efficiency (SE), higher capacity and improved cell-edge performance, and efficient hardware utilization through resource sharing and network function virtualization (NFV). However, these centralization gains come with the need for a fronthaul, which is the transport link connecting remote radio units (RRUs) to the base band unit (BBU) pool. In conventional C-RAN, legacy common public radio interface (CPRI) protocol is used on the fronthaul network to transport the raw, unprocessed baseband in-phase/quadrature-phase (I/Q) samples between the BBU and the RRUs, and it demands a huge fronthaul bandwidth, a strict low-latency, in the order of a few hundred microseconds, and a very high reliability. Hence, in order to relax the excessive fronthaul bandwidth and stringent low-latency requirements, as well as to enhance the flexibility of the fronthaul, it is utmost important to redesign the fronthaul, while still profiting from the acclaimed centralization benefits. Therefore, a flexibly centralized C-RAN with different functional splits has been introduced. In addition, 5G mobile fronthaul (often also termed as an evolved fronthaul ) is envisioned to be packet-based, utilizing the Ethernet as a transport technology. In this thesis, to circumvent the fronthaul bandwidth constraint, a packetized fronthaul considering an appropriate functional split such that the fronthaul data rate is coupled with actual user data rate, unlike the classical C-RAN where fronthaul data rate is always static and independent of the traffic load, is justifiably chosen. We adapt queuing and spatial traffic models to derive the mathematical expressions for statistical multiplexing gains that can be obtained from the randomness in the user traffic. Through this, we show that the required fronthaul bandwidth can be reduced significantly, depending on the overall traffic demand, correlation distance and outage probability. Furthermore, an iterative optimization algorithm is developed, showing the impacts of number of pilots on a bandwidth-constrained fronthaul. This algorithm achieves additional reduction in the required fronthaul bandwidth. Next, knowing the multiplexing gains and possible fronthaul bandwidth reduction, it is beneficial for the mobile network operators (MNOs) to deploy the optical transceiver (TRX) modules in C-RAN cost efficiently. For this, using the same framework, a cost model for fronthaul TRX cost optimization is presented. This is essential in C-RAN, because in a wavelength division multiplexing-passive optical network (WDM-PON) system, TRXs are generally deployed to serve at a peak load. But, because of variations in the traffic demands, owing to tidal effect, the fronthaul can be dimensioned requiring a lower capacity allowing a reasonable outage, thus giving rise to cost saving by deploying fewer TRXs, and energy saving by putting the unused TRXs in sleep mode. The second focus of the thesis is the fronthaul latency analysis, which is a critical performance metric, especially for ultra-reliable and low latency communication (URLLC). An analytical framework to calculate the latency in the uplink (UL) of C-RAN massive multiple-input multiple-output (MIMO) system is presented. For this, a continuous-time queuing model for the Ethernet switch in the fronthaul network, which aggregates the UL traffic from several massive MIMO-aided RRUs, is considered. The closed-form solutions for the moment generating function (MGF) of sojourn time, waiting time and queue length distributions are derived using Pollaczek–Khinchine formula for our M/HE/1 queuing model, and evaluated via numerical solutions. In addition, the packet loss rate – due to the inability of the packets to reach the destination in a certain time – is derived. Due to the slotted nature of the UL transmissions, the model is extended to a discrete-time queuing model. The impact of the packet arrival rate, average packet size, SE of users, and fronthaul capacity on the sojourn time, waiting time and queue length distributions are analyzed. While offloading more signal processing functionalities to the RRU reduces the required fronthaul bandwidth considerably, this increases the complexity at the RRU. Hence, considering the 5G New Radio (NR) flexible numerology and XRAN functional split with a detailed radio frequency (RF) chain at the RRU, the total RRU complexity is computed first, and later, a tradeoff between the required fronthaul bandwidth and RRU complexity is analyzed. We conclude that despite the numerous C-RAN benefits, the stringent fronthaul bandwidth and latency constraints must be carefully evaluated, and an optimal functional split is essential to meet diverse set of requirements imposed by new radio access technologies (RATs).Ein cloud-basiertes Mobilfunkzugangsnetz (cloud radio access network, C-RAN) stellt eine vielversprechende Architektur für das RAN der nächsten Generation dar, um die vielfältigen und strengen Anforderungen der fünften (5G) und zukünftigen Generationen von Mobilfunknetzen zu erfüllen. C-RAN bietet mehrere Vorteile, wie z.B. reduzierte Investitions- (CAPEX) und Betriebskosten (OPEX), erhöhte spektrale Effizienz (SE), höhere Kapazität und verbesserte Leistung am Zellrand sowie effiziente Hardwareauslastung durch Ressourcenteilung und Virtualisierung von Netzwerkfunktionen (network function virtualization, NFV). Diese Zentralisierungsvorteile erfordern jedoch eine Transportverbindung (Fronthaul), die die Antenneneinheiten (remote radio units, RRUs) mit dem Pool an Basisbandeinheiten (basisband unit, BBU) verbindet. Im konventionellen C-RAN wird das bestehende CPRI-Protokoll (common public radio interface) für das Fronthaul-Netzwerk verwendet, um die rohen, unverarbeitet n Abtastwerte der In-Phaseund Quadraturkomponente (I/Q) des Basisbands zwischen der BBU und den RRUs zu transportieren. Dies erfordert eine enorme Fronthaul-Bandbreite, eine strenge niedrige Latenz in der Größenordnung von einigen hundert Mikrosekunden und eine sehr hohe Zuverlässigkeit. Um die extrem große Fronthaul-Bandbreite und die strengen Anforderungen an die geringe Latenz zu lockern und die Flexibilität des Fronthauls zu erhöhen, ist es daher äußerst wichtig, das Fronthaul neu zu gestalten und dabei trotzdem von den erwarteten Vorteilen der Zentralisierung zu profitieren. Daher wurde ein flexibel zentralisiertes CRAN mit unterschiedlichen Funktionsaufteilungen eingeführt. Außerdem ist das mobile 5G-Fronthaul (oft auch als evolved Fronthaul bezeichnet) als paketbasiert konzipiert und nutzt Ethernet als Transporttechnologie. Um die Bandbreitenbeschränkung zu erfüllen, wird in dieser Arbeit ein paketbasiertes Fronthaul unter Berücksichtigung einer geeigneten funktionalen Aufteilung so gewählt, dass die Fronthaul-Datenrate mit der tatsächlichen Nutzdatenrate gekoppelt wird, im Gegensatz zum klassischen C-RAN, bei dem die Fronthaul-Datenrate immer statisch und unabhängig von der Verkehrsbelastung ist. Wir passen Warteschlangen- und räumliche Verkehrsmodelle an, um mathematische Ausdrücke für statistische Multiplexing- Gewinne herzuleiten, die aus der Zufälligkeit im Benutzerverkehr gewonnen werden können. Hierdurch zeigen wir, dass die erforderliche Fronthaul-Bandbreite abhängig von der Gesamtverkehrsnachfrage, der Korrelationsdistanz und der Ausfallwahrscheinlichkeit deutlich reduziert werden kann. Darüber hinaus wird ein iterativer Optimierungsalgorithmus entwickelt, der die Auswirkungen der Anzahl der Piloten auf das bandbreitenbeschränkte Fronthaul zeigt. Dieser Algorithmus erreicht eine zusätzliche Reduktion der benötigte Fronthaul-Bandbreite. Mit dem Wissen über die Multiplexing-Gewinne und die mögliche Reduktion der Fronthaul-Bandbreite ist es für die Mobilfunkbetreiber (mobile network operators, MNOs) von Vorteil, die Module des optischen Sendeempfängers (transceiver, TRX) kostengünstig im C-RAN einzusetzen. Dazu wird unter Verwendung des gleichen Rahmenwerks ein Kostenmodell zur Fronthaul-TRX-Kostenoptimierung vorgestellt. Dies ist im C-RAN unerlässlich, da in einem WDM-PON-System (wavelength division multiplexing-passive optical network) die TRX im Allgemeinen bei Spitzenlast eingesetzt werden. Aufgrund der Schwankungen in den Verkehrsanforderungen (Gezeiteneffekt) kann das Fronthaul jedoch mit einer geringeren Kapazität dimensioniert werden, die einen vertretbaren Ausfall in Kauf nimmt, was zu Kosteneinsparungen durch den Einsatz von weniger TRXn und Energieeinsparungen durch den Einsatz der ungenutzten TRX im Schlafmodus führt. Der zweite Schwerpunkt der Arbeit ist die Fronthaul-Latenzanalyse, die eine kritische Leistungskennzahl liefert, insbesondere für die hochzuverlässige und niedriglatente Kommunikation (ultra-reliable low latency communications, URLLC). Ein analytisches Modell zur Berechnung der Latenz im Uplink (UL) des C-RAN mit massivem MIMO (multiple input multiple output) wird vorgestellt. Dazu wird ein Warteschlangen-Modell mit kontinuierlicher Zeit für den Ethernet-Switch im Fronthaul-Netzwerk betrachtet, das den UL-Verkehr von mehreren RRUs mit massivem MIMO aggregiert. Die geschlossenen Lösungen für die momenterzeugende Funktion (moment generating function, MGF) von Verweildauer-, Wartezeit- und Warteschlangenlängenverteilungen werden mit Hilfe der Pollaczek-Khinchin-Formel für unser M/HE/1-Warteschlangenmodell hergeleitet und mittels numerischer Verfahren ausgewertet. Darüber hinaus wird die Paketverlustrate derjenigen Pakete, die das Ziel nicht in einer bestimmten Zeit erreichen, hergeleitet. Aufgrund der Organisation der UL-Übertragungen in Zeitschlitzen wird das Modell zu einem Warteschlangenmodell mit diskreter Zeit erweitert. Der Einfluss der Paketankunftsrate, der durchschnittlichen Paketgröße, der SE der Benutzer und der Fronthaul-Kapazität auf die Verweildauer-, dieWartezeit- und dieWarteschlangenlängenverteilung wird analysiert. Während das Verlagern weiterer Signalverarbeitungsfunktionalitäten an die RRU die erforderliche Fronthaul-Bandbreite erheblich reduziert, erhöht sich dadurch im Gegenzug die Komplexität der RRU. Daher wird unter Berücksichtigung der flexiblen Numerologie von 5G New Radio (NR) und der XRAN-Funktionenaufteilung mit einer detaillierten RF-Kette (radio frequency) am RRU zunächst die gesamte RRU-Komplexität berechnet und später ein Kompromiss zwischen der erforderlichen Fronthaul-Bandbreite und der RRU-Komplexität untersucht. Wir kommen zu dem Schluss, dass trotz der zahlreichen Vorteile von C-RAN die strengen Bandbreiten- und Latenzbedingungen an das Fronthaul sorgfältig geprüft werden müssen und eine optimale funktionale Aufteilung unerlässlich ist, um die vielfältigen Anforderungen der neuen Funkzugangstechnologien (radio access technologies, RATs) zu erfüllen

    Decentralized Massive MIMO Processing Exploring Daisy-chain Architecture and Recursive Algorithms

    Full text link
    Algorithms for Massive MIMO uplink detection and downlink precoding typically rely on a centralized approach, by which baseband data from all antenna modules are routed to a central node in order to be processed. In the case of Massive MIMO, where hundreds or thousands of antennas are expected in the base-station, said routing becomes a bottleneck since interconnection throughput is limited. This paper presents a fully decentralized architecture and an algorithm for Massive MIMO uplink detection and downlink precoding based on the Stochastic Gradient Descent (SGD) method, which does not require a central node for these tasks. Through a recursive approach and very low complexity operations, the proposed algorithm provides a good trade-off between performance, interconnection throughput and latency. Further, our proposed solution achieves significantly lower interconnection data-rate than other architectures, enabling future scalability.Comment: Manuscript accepted for publication in IEEE Transactions on Signal Processin

    Realizing In-Memory Baseband Processing for Ultra-Fast and Energy-Efficient 6G

    Full text link
    To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-memory computing-based baseband processors using resistive random-access memory (RRAM) present an attractive solution. In this paper, we propose and demonstrate RRAM-implemented in-memory baseband processing for the widely adopted multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) air interface. Its key feature is to execute the key operations, including discrete Fourier transform (DFT) and MIMO detection using linear minimum mean square error (L-MMSE) and zero forcing (ZF), in one-step. In addition, RRAM-based channel estimation module is proposed and discussed. By prototyping and simulations, we demonstrate the feasibility of RRAM-based full-fledged communication system in hardware, and reveal it can outperform state-of-the-art baseband processors with a gain of 91.2×\times in latency and 671×\times in energy efficiency by large-scale simulations. Our results pave a potential pathway for RRAM-based in-memory computing to be implemented in the era of the sixth generation (6G) mobile communications.Comment: arXiv admin note: text overlap with arXiv:2205.0356

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    A Low-Latency FFT-IFFT Cascade Architecture

    Full text link
    This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. Folding can be used to design partly-parallel architectures for FFT and IFFT. While many cascaded FFT-IFFT architectures can be designed using various folding sets for the FFT and the IFFT, for a specified folded FFT architecture, there exists a unique folding set to design the IFFT architecture that does not require an intermediate buffer. Such a folding set is designed by processing the output of the FFT as soon as possible (ASAP) in the folded IFFT. Elimination of the intermediate buffer reduces latency and saves area. The proposed approach is also extended to interleaved processing of multi-channel time-series. The proposed FFT-IFFT cascade architecture saves about N/2 memory elements and N/4 clock cycles of latency compared to a design with identical folding sets. For the 2-interleaved FFT-IFFT cascade, the memory and latency savings are, respectively, N/2 units and N/2 clock cycles, compared to a design with identical folding sets

    Adaptive Baseband Pro cessing and Configurable Hardware for Wireless Communication

    Get PDF
    The world of information is literally at one’s fingertips, allowing access to previously unimaginable amounts of data, thanks to advances in wireless communication. The growing demand for high speed data has necessitated theuse of wider bandwidths, and wireless technologies such as Multiple-InputMultiple-Output (MIMO) have been adopted to increase spectral efficiency.These advanced communication technologies require sophisticated signal processing, often leading to higher power consumption and reduced battery life.Therefore, increasing energy efficiency of baseband hardware for MIMO signal processing has become extremely vital. High Quality of Service (QoS)requirements invariably lead to a larger number of computations and a higherpower dissipation. However, recognizing the dynamic nature of the wirelesscommunication medium in which only some channel scenarios require complexsignal processing, and that not all situations call for high data rates, allowsthe use of an adaptive channel aware signal processing strategy to provide adesired QoS. Information such as interference conditions, coherence bandwidthand Signal to Noise Ratio (SNR) can be used to reduce algorithmic computations in favorable channels. Hardware circuits which run these algorithmsneed flexibility and easy reconfigurability to switch between multiple designsfor different parameters. These parameters can be used to tune the operations of different components in a receiver based on feedback from the digitalbaseband. This dissertation focuses on the optimization of digital basebandcircuitry of receivers which use feedback to trade power and performance. Aco-optimization approach, where designs are optimized starting from the algorithmic stage through the hardware architectural stage to the final circuitimplementation is adopted to realize energy efficient digital baseband hardwarefor mobile 4G devices. These concepts are also extended to the next generation5G systems where the energy efficiency of the base station is improved.This work includes six papers that examine digital circuits in MIMO wireless receivers. Several key blocks in these receiver include analog circuits thathave residual non-linearities, leading to signal intermodulation and distortion.Paper-I introduces a digital technique to detect such non-linearities and calibrate analog circuits to improve signal quality. The concept of a digital nonlinearity tuning system developed in Paper-I is implemented and demonstratedin hardware. The performance of this implementation is tested with an analogchannel select filter, and results are presented in Paper-II. MIMO systems suchas the ones used in 4G, may employ QR Decomposition (QRD) processors tosimplify the implementation of tree search based signal detectors. However,the small form factor of the mobile device increases spatial correlation, whichis detrimental to signal multiplexing. Consequently, a QRD processor capableof handling high spatial correlation is presented in Paper-III. The algorithm and hardware implementation are optimized for carrier aggregation, which increases requirements on signal processing throughput, leading to higher powerdissipation. Paper-IV presents a method to perform channel-aware processingwith a simple interpolation strategy to adaptively reduce QRD computationcount. Channel properties such as coherence bandwidth and SNR are used toreduce multiplications by 40% to 80%. These concepts are extended to usetime domain correlation properties, and a full QRD processor for 4G systemsfabricated in 28 nm FD-SOI technology is presented in Paper-V. The designis implemented with a configurable architecture and measurements show thatcircuit tuning results in a highly energy efficient processor, requiring 0.2 nJ to1.3 nJ for each QRD. Finally, these adaptive channel-aware signal processingconcepts are examined in the scope of the next generation of communicationsystems. Massive MIMO systems increase spectral efficiency by using a largenumber of antennas at the base station. Consequently, the signal processingat the base station has a high computational count. Paper-VI presents a configurable detection scheme which reduces this complexity by using techniquessuch as selective user detection and interpolation based signal processing. Hardware is optimized for resource sharing, resulting in a highly reconfigurable andenergy efficient uplink signal detector

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times
    corecore