47 research outputs found

    A low-complexity turbo decoder architecture for energy-efficient wireless sensor networks

    No full text
    Turbo codes have recently been considered for energy-constrained wireless communication applications, since they facilitate a low transmission energy consumption. However, in order to reduce the overall energy consumption, Look-Up- Table-Log-BCJR (LUT-Log-BCJR) architectures having a low processing energy consumption are required. In this paper, we decompose the LUT-Log-BCJR architecture into its most fundamental Add Compare Select (ACS) operations and perform them using a novel low-complexity ACS unit. We demonstrate that our architecture employs an order of magnitude fewer gates than the most recent LUT-Log-BCJR architectures, facilitating a 71% energy consumption reduction. Compared to state-of- the-art Maximum Logarithmic Bahl-Cocke-Jelinek-Raviv (Max- Log-BCJR) implementations, our approach facilitates a 10% reduction in the overall energy consumption at ranges above 58 m

    Baseband Processing for 5G and Beyond: Algorithms, VLSI Architectures, and Co-design

    Get PDF
    In recent years the number of connected devices and the demand for high data-rates have been significantly increased. This enormous growth is more pronounced by the introduction of the Internet of things (IoT) in which several devices are interconnected to exchange data for various applications like smart homes and smart cities. Moreover, new applications such as eHealth, autonomous vehicles, and connected ambulances set new demands on the reliability, latency, and data-rate of wireless communication systems, pushing forward technology developments. Massive multiple-input multiple-output (MIMO) is a technology, which is employed in the 5G standard, offering the benefits to fulfill these requirements. In massive MIMO systems, base station (BS) is equipped with a very large number of antennas, serving several users equipments (UEs) simultaneously in the same time and frequency resource. The high spatial multiplexing in massive MIMO systems, improves the data rate, energy and spectral efficiencies as well as the link reliability of wireless communication systems. The link reliability can be further improved by employing channel coding technique. Spatially coupled serially concatenated codes (SC-SCCs) are promising channel coding schemes, which can meet the high-reliability demands of wireless communication systems beyond 5G (B5G). Given the close-to-capacity error correction performance and the potential to implement a high-throughput decoder, this class of code can be a good candidate for wireless systems B5G. In order to achieve the above-mentioned advantages, sophisticated algorithms are required, which impose challenges on the baseband signal processing. In case of massive MIMO systems, the processing is much more computationally intensive and the size of required memory to store channel data is increased significantly compared to conventional MIMO systems, which are due to the large size of the channel state information (CSI) matrix. In addition to the high computational complexity, meeting latency requirements is also crucial. Similarly, the decoding-performance gain of SC-SCCs also do come at the expense of increased implementation complexity. Moreover, selecting the proper choice of design parameters, decoding algorithm, and architecture will be challenging, since spatial coupling provides new degrees of freedom in code design, and therefore the design space becomes huge. The focus of this thesis is to perform co-optimization in different design levels to address the aforementioned challenges/requirements. To this end, we employ system-level characteristics to develop efficient algorithms and architectures for the following functional blocks of digital baseband processing. First, we present a fast Fourier transform (FFT), an inverse FFT (IFFT), and corresponding reordering scheme, which can significantly reduce the latency of orthogonal frequency-division multiplexing (OFDM) demodulation and modulation as well as the size of reordering memory. The corresponding VLSI architectures along with the application specific integrated circuit (ASIC) implementation results in a 28 nm CMOS technology are introduced. In case of a 2048-point FFT/IFFT, the proposed design leads to 42% reduction in the latency and size of reordering memory. Second, we propose a low-complexity massive MIMO detection scheme. The key idea is to exploit channel sparsity to reduce the size of CSI matrix and eventually perform linear detection followed by a non-linear post-processing in angular domain using the compressed CSI matrix. The VLSI architecture for a massive MIMO with 128 BS antennas and 16 UEs along with the synthesis results in a 28 nm technology are presented. As a result, the proposed scheme reduces the complexity and required memory by 35%–73% compared to traditional detectors while it has better detection performance. Finally, we perform a comprehensive design space exploration for the SC-SCCs to investigate the effect of different design parameters on decoding performance, latency, complexity, and hardware cost. Then, we develop different decoding algorithms for the SC-SCCs and discuss the associated decoding performance and complexity. Also, several high-level VLSI architectures along with the corresponding synthesis results in a 12 nm process are presented, and various design tradeoffs are provided for these decoding schemes

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio

    Get PDF
    Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation

    Turbo decoder VLSI implementations for multi-standards wireless communication systems

    Get PDF

    New VLSI design of a MAP/BCJR decoder.

    Get PDF
    Any communication channel suffers from different kinds of noises. By employing forward error correction (FEC) techniques, the reliability of the communication channel can be increased. One of the emerging FEC methods is turbo coding (iterative coding), which employs soft input soft output (SISO) decoding algorithms like maximum a posteriori (MAP) algorithm in its constituent decoders. In this thesis we introduce a design with lower complexity and less than 0.1dB performance loss compare to the best performance observed in Max-Log-MAP algorithm. A parallel and pipeline design of a MAP decoder suitable for ASIC (Application Specific Integrated Circuits) is used to increase the throughput of the chip. The branch metric calculation unit is studied in detail and a new design with lower complexity is proposed. The design is also flexible to communication block sizes, which makes it ideal for variable frame length communication systems. A new even-spaced quantization technique for the proposed MAP decoder is utilized. Normalization techniques are studied and a suitable technique for the Max-Log-MAP decoder is explained. The decoder chip is synthesized and implemented in a 0.18 mum six-layer metal CMOS technology. (Abstract shortened by UMI.)Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .S23. Source: Masters Abstracts International, Volume: 43-05, page: 1783. Adviser: Majid Ahmadi. Thesis (M.A.Sc.)--University of Windsor (Canada), 2004

    Realizing Software Defined Radio - A Study in Designing Mobile Supercomputers.

    Full text link
    The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult to verify. A programmable hardware platform capable of supporting software implementations of the physical layer, or Software Defined Radio (SDR), has a number of advantages. These include support for multiple protocols, faster time-to-market, higher chip volumes, and support for late implementation changes. The challenge is to achieve this under the power budget of a mobile device. Wireless communications belong to an emerging class of applications with the processing requirements of a supercomputer but the power constraints of a mobile device -- mobile supercomputing. This thesis presents a set of design proposals for building a programmable wireless communication solution. In order to design a solution that can meet the lofty requirements of SDR, this thesis takes an application-centric design approach -- evaluate and optimize all aspects of the design based on the characteristics of wireless communication protocols. This includes a DSP processor architecture optimized for wireless baseband processing, wireless algorithm optimizations, and language and compilation tool support for the algorithm software and the processor hardware. This thesis first analyzes the software characteristics of SDR. Based on the analysis, this thesis proposes the Signal-Processing On-Demand Architecture (SODA), a fully programmable multi-core architecture that can support the computation requirements of third generation wireless protocols, while operating within the power budget of a mobile device. This thesis then presents wireless algorithm implementations and optimizations for the SODA processor architecture. A signal processing language extension (SPEX) is proposed to help the software development efforts of wireless communication protocols on SODA-like multi-core architecture. And finally, the SPIR compiler is proposed to automatically map SPEX code onto the multi-core processor hardware.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61760/1/linyz_1.pd

    Multicarrier Faster-than-Nyquist Signaling Transceivers: From Theory to Practice

    Get PDF
    The demand for spectrum resources in cellular systems worldwide has seen a tremendous escalation in the recent past. The mobile phones of today are capable of being cameras taking pictures and videos, able to browse the Internet, do video calling and much more than an yesteryear computer. Due to the variety and the amount of information that is being transmitted the demand for spectrum resources is continuously increasing. Efficient use of bandwidth resources has hence become a key parameter in the design and realization of wireless communication systems. Faster-than-Nyquist (FTN) signaling is one such technique that achieves bandwidth efficiency by making better use of the available spectrum resources at the expense of higher processing complexity in the transceiver. This thesis addresses the challenges and design trade offs arising during the hardware realization of Faster-than-Nyquist signaling transceivers. The FTN system has been evaluated for its achievable performance compared to the processing overhead in the transmitter and the receiver. Coexistence with OFDM systems, a more popular multicarrier scheme in existing and upcoming wireless standards, has been considered by designing FTN specific processing blocks as add-ons to the conventional transceiver chain. A multicarrier system capable of operating under both orthogonal and FTN signaling has been developed. The performance of the receiver was evaluated for AWGN and fading channels. The FTN system was able to achieve 2x improvement in bandwidth usage with similar performance as that of an OFDM system. The extra processing in the receiver was in terms of an iterative decoder for the decoding of FTN modulated signals. An efficient hardware architecture for the iterative decoder reusing the FTN specific processing blocks and realize different functionality has been designed. An ASIC implementation of this decoder was implemented in a 65nm CMOS technology and the implemented chip has been successfully verified for its functionality

    Parallel algorithms and architectures for low power video decoding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 197-204).Parallelism coupled with voltage scaling is an effective approach to achieve high processing performance with low power consumption. This thesis presents parallel architectures and algorithms designed to deliver the power and performance required for current and next generation video coding. Coding efficiency, area cost and scalability are also addressed. First, a low power video decoder is presented for the current state-of-the-art video coding standard H.264/AVC. Parallel architectures are used along with voltage scaling to deliver high definition (HD) decoding at low power levels. Additional architectural optimizations such as reducing memory accesses and multiple frequency/voltage domains are also described. An H.264/AVC Baseline decoder test chip was fabricated in 65-nm CMOS. It can operate at 0.7 V for HD (720p, 30 fps) video decoding and with a measured power of 1.8 mW. The highly scalable decoder can tradeoff power and performance across >100x range. Second, this thesis demonstrates how serial algorithms, such as Context-based Adaptive Binary Arithmetic Coding (CABAC), can be redesigned for parallel architectures to enable high throughput with low coding efficiency cost. A parallel algorithm called the Massively Parallel CABAC (MP-CABAC) is presented that uses syntax element partitions and interleaved entropy slices to achieve better throughput-coding efficiency and throughput-area tradeoffs than H.264/AVC. The parallel algorithm also improves scalability by providing a third dimension to tradeoff coding efficiency for power and performance. Finally, joint algorithm-architecture optimizations are used to increase performance and reduce area with almost no coding penalty. The MP-CABAC is mapped to a highly parallel architecture with 80 parallel engines, which together delivers >10x higher throughput than existing H.264/AVC CABAC implementations. A MP-CABAC test chip was fabricated in 65-nm CMOS to demonstrate the power-performance-coding efficiency tradeoff.by Vivienne. Sze.Ph.D

    Design and Implementation of Belief Propagation Symbol Detectors for Wireless Intersymbol Interference Channels

    Get PDF
    In modern wireless communication systems, intersymbol interference (ISI) introduced by frequency selective fading is one of the major impairments to reliable data communication. In ISI channels, the receiver observes the superposition of multiple delayed reflections of the transmitted signal, which will result errors in the decision device. As the data rate increases, the effect of ISI becomes severe. To combat ISI, equalization is usually required for symbol detectors. The optimal maximum-likelihood sequence estimation (MLSE) based on the Viterbi algorithm (VA) may be used to estimate the transmitted sequence in the presence of the ISI. However, the computational complexity of the MLSE increases exponentially with the length of the channel impulse response (CIR). Even in channels which do not exhibit significant time dispersion, the length of the CIR will effectively increase as the sampling rate goes higher. Thus the optimal MLSE is impractical to implement in the majority of practical wireless applications. This dissertation is devoted to exploring practically implementable symbol detectors with near-optimal performance in wireless ISI channels. Particularly, we focus on the design and implementation of an iterative detector based on the belief propagation (BP) algorithm. The advantage of the BP detector is that its complexity is solely dependent on the number of nonzero coefficients in the CIR, instead of the length of the CIR. We also extend the work of BP detector design for various wireless applications. Firstly, we present a partial response BP (PRBP) symbol detector with near-optimal performance for channels which have long spanning durations but sparse multipath structure. We implement the architecture by cascading an adaptive linear equalizer (LE) with a BP detector. The channel is first partially equalized by the LE to a target impulse response (TIR) with only a few nonzero coefficients remaining. The residual ISI is then canceled by a more sophisticated BP detector. With the cascaded LE-BP structure, the symbol detector is capable to achieve a near-optimal error rate performance with acceptable implementation complexity. Moreover, we present a pipeline high-throughput implementation of the detector for channel length 30 with quadrature phase-shift keying (QPSK) modulation. The detector can achieve a maximum throughput of 206 Mb/s with an estimated core area of 3.162 mm^{2} using 90-nm technology node. At a target frequency of 515 MHz, the dynamic power is about 1.096 W. Secondly, we investigate the performance of aforementioned PRBP detector under a more generic 3G channel rather than the sparse channel. Another suboptimal partial response maximum-likelihood (PRML) detector is considered for comparison. Similar to the PRBP detector, the PRML detector also employs a hybrid two-stage scheme, in order to allow a tradeoff between performance and complexity. In simulations, we consider a slow fading environment and use the ITU-R 3G channel models. From the numerical results, it is shown that in frequency-selective fading wireless channels, the PRBP detector provides superior performance over both the traditional minimum mean squared error linear equalizer (MMSE-LE) and the PRML detector. Due to the effect of colored noise, the PRML detector in fading wireless channels is not as effective as it is in magnetic recording applications. Thirdly, we extend our work to accommodate the application of Advanced Television Systems Committee (ATSC) digital television (DTV) systems. In order to reduce error propagation caused by the traditional decision feedback equalizer (DFE) in DTV receiver, we present an adaptive decision feedback sparsening filter BP (DFSF-BP) detector, which is another form of PRBP detector. Different from the aforementioned LE-BP structure, in the DFSF-BP scheme, the BP detector is followed by a nonlinear filter called DFSF as the partial response equalizer. In the first stage, the DFSF employs a modified feedback filter which leaves the strongest post-cursor ISI taps uncorrected. As a result, a long ISI channel is equalized to a sparse channel having only a small number of nonzero taps. In the second stage, the BP detector is applied to mitigate the residual ISI. Since the channel is typically time-varying and suffers from Doppler fading, the DFSF is adapted using the least mean square (LMS) algorithm, such that the amplitude and the locations of the nonzero taps of the equalized sparse channel appear to be fixed. As such, the channel appears to be static during the second stage of equalization which consists of the BP detector. Simulation results demonstrate that the proposed scheme outperforms the traditional DFE in symbol error rate, under both static channels and dynamic ATSC channels. Finally, we study the symbol detector design for cooperative communications, which have attracted a lot of attention recently for its ability to exploit increased spatial diversity available at distributed antennas on other nodes. A system framework employing non-orthogonal amplify-and-forward half-duplex relays through ISI channels is developed. Based on the system model, we first design and implement an optimal maximum-likelihood detector based on the Viterbi algorithm. As the relay period increases, the effective CIR between the source and the destination becomes long and sparse, which makes the optimal detector impractical to implement. In order to achieve a balance between the computational complexity and performance, several sub-optimal detectors are proposed. We first present a multitrellis Viterbi algorithm (MVA) based detector which decomposes the original trellis into multiple parallel irregular sub-trellises by investigating the dependencies between the received symbols. Although MVA provides near-optimal performance, it is not straightforward to decompose the trellis for arbitrary ISI channels. Next, the decision feedback sequence estimation (DFSE) based detector and BP-based detector are proposed for cooperative ISI channels. Traditionally these two detectors are used with fixed, static channels. In our model, however, the effective channel is periodically time-varying, even when the component channels themselves are static. Consequently, we modify these two detector to account for cooperative ISI channels. Through simulations in frequency selective fading channels, we demonstrate the uncoded performance of the DFSE detector and the BP detector when compared to the optimal MLSE detector. In addition to quantifying the performance of these detectors, we also include an analysis of the implementation complexity as well as a discussion on complexity/performance tradeoffs
    corecore