1,575 research outputs found
Exploring HLS Coding Techniques to Achieve Desired Turbo Decoder Architectures
Software defined radio (SDR) platforms implement many digital signal processing algorithms. These can be accelerated on an FPGA to meet performance requirements. Due to the flexibility of SDR\u27s and continually evolving communications protocols, high level synthesis (HLS) is a promising alternative to standard handcrafted design flows. A crucial component in any SDR is the error correction codes (ECC). Turbo codes are a common ECC that are implemented on an FPGA due to their computational complexity. The goal of this thesis is to explore the HLS coding techniques required to produce a design that targets the desired hardware architecture and can reach handcrafted levels of performance.
This work implemented three existing turbo decoder architectures with HLS to produce quality hardware which reaches handcrafted performance. Each targeted design was analyzed to determine its functionality and algorithm so a C implementation could be developed. Then the C code was modified and HLS directives were added to refine the design through the HLS tools. The process of code modification and processing through the HLS tools continued until the desired architecture and performance were reached.
Each design was implemented and the bottlenecks were identified and dealt with through appropriate usage of directives and C style. The use of pipelining to bypass bottlenecks added a small overhead from the ramp-up and ramp-down of the pipeline, reducing the performance by at most 1.24%. The impact of the clock constraint set within the HLS tools was also explored. It was found that the clock period and resource usage estimate generated by the HLS tools is not accurate and all evaluations should occur after hardware synthesis
Multiple Parallel Concatenated Gallager Codes: High Throughput Architecture Design and Implementation
The design of advanced wireless communication systems has been one of the most important research areas in recent years. High performance error correction schemes and high speed data services are at the heart of these systems.
Due to the excellent performance of Low-Density Parity-Check (LDPC) codes, they are good candidates for many new wireless communication standards. However, complexity, latency scalability and flexibility remain a challenge.
This thesis is concerned with investigating a new approach to coding and decoding LDPC codes based on Parallel Concatenated Gallager Code (PCGCs) using multiple constituent codes. These are a class of concatenated codes built from the direct parallel concatenation of LDPC codes without interleavers. They are characterized by a competitive BER performance while still maintaining the low complexity and flexibility attributes. New methods for encoding and decoding are presented together with BER simulation results showing the performance of these codes. Analysis in terms of the number of constituent codes is also carried out.
Complexity analysis is performed and preliminary implementation results are also given based on a proposed high throughput architecture
20 years of turbo coding and energy-aware design guidelines for energy-constrained wireless applications
During the last two decades, wireless communication has been revolutionized by near-capacity error-correcting codes (ECCs), such as turbo codes (TCs), which offer a lower bit error ratio (BER) than their predecessors, without requiring an increased transmission energy consumption (EC). Hence, TCs have found widespread employment in spectrum-constrained wireless communication applications, such as cellular telephony, wireless local area network, and broadcast systems. Recently, however, TCs have also been considered for energy-constrained wireless communication applications, such as wireless sensor networks and the `Internet of Things.' In these applications, TCs may also be employed for reducing the required transmission EC, instead of improving the BER. However, TCs have relatively high computational complexities, and hence, the associated signal-processing-related ECs are not insignificant. Therefore, when parameterizing TCs for employment in energy-constrained applications, both the processing EC and the transmission EC must be jointly considered. In this tutorial, we investigate holistic design methodologies conceived for this purpose. We commence by introducing turbo coding in detail, highlighting the various parameters of TCs and characterizing their impact on the encoded bit rate, on the radio frequency bandwidth requirement, on the transmission EC and on the BER. Following this, energy-efficient TC decoder application-specific integrated circuit (ASIC) architecture designs are exemplified, and the processing EC is characterized as a function of the TC parameters. Finally, the TC parameters are selected in order to minimize the sum of the processing EC and the transmission EC
Exploring Spin-transfer-torque devices and memristors for logic and memory applications
As scaling CMOS devices is approaching its physical limits, researchers have begun exploring newer devices and architectures to replace CMOS.
Due to their non-volatility and high density, Spin Transfer Torque (STT) devices are among the most prominent candidates for logic and memory applications. In this research, we first considered a new logic style called All Spin Logic (ASL). Despite its advantages, ASL consumes a large amount of static power; thus, several optimizations can be performed to address this issue. We developed a systematic methodology to perform the optimizations to ensure stable operation of ASL.
Second, we investigated reliable design of STT-MRAM bit-cells and addressed the conflicting read and write requirements, which results in overdesign of the bit-cells. Further, a Device/Circuit/Architecture co-design framework was developed to optimize the STT-MRAM devices by exploring the design space through jointly considering yield enhancement techniques at different levels of abstraction.
Recent advancements in the development of memristive devices have opened new opportunities for hardware implementation of non-Boolean computing. To this end, the suitability of memristive devices for swarm intelligence algorithms has enabled researchers to solve a maze in hardware. In this research, we utilized swarm intelligence of memristive networks to perform image edge detection. First, we proposed a hardware-friendly algorithm for image edge detection based on ant colony. Next, we designed the image edge detection algorithm using memristive networks
Improve the Usability of Polar Codes: Code Construction, Performance Enhancement and Configurable Hardware
Error-correcting codes (ECC) have been widely used for forward error correction (FEC) in modern communication systems to dramatically reduce the signal-to-noise ratio (SNR) needed to achieve a given bit error rate (BER). Newly invented polar codes have attracted much interest because of their capacity-achieving potential, efficient encoder and decoder implementation, and flexible architecture design space.This dissertation is aimed at improving the usability of polar codes by providing a practical code design method, new approaches to improve the performance of polar code, and a configurable hardware design that adapts to various specifications.
State-of-the-art polar codes are used to achieve extremely low error rates. In this work, high-performance FPGA is used in prototyping polar decoders to catch rare-case errors for error-correcting performance verification and error analysis. To discover the polarization characteristics and error patterns of polar codes, an FPGA emulation platform for belief-propagation (BP) decoding is built by a semi-automated construction flow. The FPGA-based emulation achieves significant speedup in large-scale experiments involving trillions of data frames. The platform is a key enabler of this work.
The frozen set selection of polar codes, known as bit selection, is critical to the error-correcting performance of polar codes. A simulation-based in-order bit selection method is developed to evaluate the error rate of each bit using Monte Carlo simulations. The frozen set is selected based on the bit reliability ranking. The resulting code construction exhibits up to 1 dB coding gain with respect to the conventional bit selection.
To further improve the coding gain of BP decoder for low-error-rate applications, the decoding error mechanisms are studied and analyzed, and the errors are classified based on their distinct signatures. Error detection is enabled by low-cost CRC concatenation, and post-processing algorithms targeting at each type of the error is designed to mitigate the vast majority of the decoding errors. The post-processor incurs only a small implementation overhead, but it provides more than an order of magnitude improvement of the error-correcting performance.
The regularity of the BP decoder structure offers many hardware architecture choices. Silicon area, power consumption, throughput and latency can be traded to reach the optimal design points for practical use cases. A comprehensive design space exploration reveals several practical architectures at different design points. The scalability of each architecture is also evaluated based on the implementation candidates.
For dynamic communication channels, such as wireless channels in the upcoming 5G applications, multiple codes of different lengths and code rates are needed to t varying channel conditions. To minimize implementation cost, a universal decoder architecture is proposed to support multiple codes through hardware reuse. A 40nm length- and rate-configurable polar decoder ASIC is demonstrated to fit various
communication environments and service requirements.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140817/1/shuangsh_1.pd
Magic-State Functional Units: Mapping and Scheduling Multi-Level Distillation Circuits for Fault-Tolerant Quantum Architectures
Quantum computers have recently made great strides and are on a long-term
path towards useful fault-tolerant computation. A dominant overhead in
fault-tolerant quantum computation is the production of high-fidelity encoded
qubits, called magic states, which enable reliable error-corrected computation.
We present the first detailed designs of hardware functional units that
implement space-time optimized magic-state factories for surface code
error-corrected machines. Interactions among distant qubits require surface
code braids (physical pathways on chip) which must be routed. Magic-state
factories are circuits comprised of a complex set of braids that is more
difficult to route than quantum circuits considered in previous work [1]. This
paper explores the impact of scheduling techniques, such as gate reordering and
qubit renaming, and we propose two novel mapping techniques: braid repulsion
and dipole moment braid rotation. We combine these techniques with graph
partitioning and community detection algorithms, and further introduce a
stitching algorithm for mapping subgraphs onto a physical machine. Our results
show a factor of 5.64 reduction in space-time volume compared to the best-known
previous designs for magic-state factories.Comment: 13 pages, 10 figure
Deep Ensemble of Weighted Viterbi Decoders for Tail-Biting Convolutional Codes
Tail-biting convolutional codes extend the classical zero-termination
convolutional codes: Both encoding schemes force the equality of start and end
states, but under the tail-biting each state is a valid termination. This paper
proposes a machine-learning approach to improve the state-of-the-art decoding
of tail-biting codes, focusing on the widely employed short length regime as in
the LTE standard. This standard also includes a CRC code.
First, we parameterize the circular Viterbi algorithm, a baseline decoder
that exploits the circular nature of the underlying trellis. An ensemble
combines multiple such weighted decoders, each decoder specializes in decoding
words from a specific region of the channel words' distribution. A region
corresponds to a subset of termination states; the ensemble covers the entire
states space. A non-learnable gating satisfies two goals: it filters easily
decoded words and mitigates the overhead of executing multiple weighted
decoders. The CRC criterion is employed to choose only a subset of experts for
decoding purpose. Our method achieves FER improvement of up to 0.75dB over the
CVA in the waterfall region for multiple code lengths, adding negligible
computational complexity compared to the circular Viterbi algorithm in high
SNRs
Recommended from our members
Practical Variation-Aware Designs in Quantum Computing
Variations are prevalent in all aspects of quantum computing. On solid state quantum devices, fabrication errors lead to variations in device connectivity. Among the qubits that are available for use, there are still variations in multiple properties. Other than hardware variations, different algorithms and operations impose different requirements on the devices and systems. In order to bridge the gap between the theory and implementation of quantum computing, we need practical designs that are aware of variations and system-level tradeoffs. This thesis includes three examples of adapting to variations: choosing two-qubit basis gates based on individual qubits’ properties, adapting error correction codes and using modular architecture to support fault-tolerant computation in the presence of fabrication defects, and adapting real time decoding protocols to support large patches of topological codes that arise during lattice surgery operations
- …