475 research outputs found

    Energy Consumption Analysis of Software Polar Decoders on Low Power Processors

    Get PDF
    International audienceThis paper presents a new dynamic and fully generic implementation of a Successive Cancellation (SC) decoder (multi-precision support and intra-/inter-frame strategy support). This fully generic SC decoder is used to perform comparisons of the different configurations in terms of throughput, latency and energy consumption. A special emphasis is given on the energy consumption on low power embedded processors for software defined radio (SDR) systems. A N=4096 code length, rate 1/2 software SC decoder consumes only 14 nJ per bit on an ARM Cortex-A57 core, while achieving 65 Mbps. Some design guidelines are given in order to adapt the configuration to the application context

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    Toward High-Performance Implementation of 5G SCMA Algorithms

    Get PDF
    International audienceThe recent evolution of mobile communication systems toward a 5G network is associated with the search for new types of non-orthogonal modulations such as Sparse Code Multiple Access (SCMA). Such modulations are proposed in response to demands for increasing the number of connected users. SCMA is a non-orthogonal multiple access technique that offers improved Bit Error Rate (BER) performance and higher spectral efficiency than other comparable techniques, but these improvements come at the cost of complex decoders. There are many challenges in designing near-optimum high throughput SCMA decoders. This paper explores means to enhance the performance of SCMA decoders. To achieve this goal, various improvements to the MPA algorithms are proposed. They notably aim at adapting SCMA decoding to the Single Instruction Multiple Data (SIMD) paradigm. An approximate modeling of noise is performed to reduce the complexity of floating-point calculations. The effects of Forward Error Corrections (FEC) such as polar, turbo and LDPC codes, as well as different ways of accessing memory and improving power efficiency of modified MPAs are investigated. The results show that the throughput of a SCMA decoder can be increased by 3.1 to 21 times when compared to the original MPA on different computing platforms using the suggested improvements

    Design Solutions For Modular Satellite Architectures

    Get PDF
    The cost-effective access to space envisaged by ESA would open a wide range of new opportunities and markets, but is still many years ahead. There is still a lack of devices, circuits, systems which make possible to develop satellites, ground stations and related services at costs compatible with the budget of academic institutions and small and medium enterprises (SMEs). As soon as the development time and cost of small satellites will fall below a certain threshold (e.g. 100,000 to 500,000 €), appropriate business models will likely develop to ensure a cost-effective and pervasive access to space, and related infrastructures and services. These considerations spurred the activity described in this paper, which is aimed at: - proving the feasibility of low-cost satellites using COTS (Commercial Off The Shelf) devices. This is a new trend in the space industry, which is not yet fully exploited due to the belief that COTS devices are not reliable enough for this kind of applications; - developing a flight model of a flexible and reliable nano-satellite with less than 25,000€; - training students in the field of avionics space systems: the design here described is developed by a team including undergraduate students working towards their graduation work. The educational aspects include the development of specific new university courses; - developing expertise in the field of low-cost avionic systems, both internally (university staff) and externally (graduated students will bring their expertise in their future work activity); - gather and cluster expertise and resources available inside the university around a common high-tech project; - creating a working group composed of both University and SMEs devoted to the application of commercially available technology to space environment. The first step in this direction was the development of a small low cost nano-satellite, started in the year 2004: the name of this project was PiCPoT (Piccolo Cubo del Politecnico di Torino, Small Cube of Politecnico di Torino). The project was carried out by some departments of the Politecnico, in particular Electronics and Aerospace. The main goal of the project was to evaluate the feasibility of using COTS components in a space project in order to greatly reduce costs; the design exploited internal subsystems modularity to allow reuse and further cost reduction for future missions. Starting from the PiCPoT experience, in 2006 we began a new project called ARaMiS (Speretta et al., 2007) which is the Italian acronym for Modular Architecture for Satellites. This work describes how the architecture of the ARaMiS satellite has been obtained from the lesson learned from our former experience. Moreover we describe satellite operations, giving some details of the major subsystems. This work is composed of two parts. The first one describes the design methodology, solutions and techniques that we used to develop the PiCPoT satellite; it gives an overview of its operations, with some details of the major subsystems. Details on the specifications can also be found in (Del Corso et al., 2007; Passerone et al, 2008). The second part, indeed exploits the experience achieved during the PiCPoT development and describes a proposal for a low-cost modular architecture for satellite

    Design and FPGA Implementation of CORDIC-based 8-point 1D DCT Processor

    Get PDF
    CORDIC or CO-ordinate Rotation DIgital Computer is a fast, simple, efficient and powerful algorithm used for diverse Digital Signal Processing applications. Primarily developed for real-time airborne computations, it uses a unique computing technique which is especially suitable for solving the trigonometric relationships involved in plane co-ordinate rotation and conversion from rectangular to polar form. It comprises a special serial arithmetic unit having three shift registers, three adders/subtractors, Look-Up table and special interconnections. Using a prescribed sequence of conditional additions or subtractions the CORDIC arithmetic unit can be controlled to solve either of the following equations: Y’=K (Ycos λ+ Xsin λ) X’=K (Xcos λ - Ysin λ); where K is a constant In this project: • A CORDIC-based processor for sine/cosine calculation was designed using VHDL programming in Xilinx ISE 10.1. The CORDIC module was tested for its functionality and correctness by test-bench analysis. Subsequently, FPGA implementation of the CORDIC core followed by ChipScopePro analysis of the output logic waveforms was performed. • Using this CORDIC core a DCT processor was designed to calculate the 8-point 1D DCT. The functionality and operational correctness of this processor was tested, first on the test-bench and then via ChipScopePro analysis, post FPGA implementation. The output obtained in both the cases was compared with the actual values to test for consistency and the percentage of accuracy was established. Power consumption and FPGA resource utilization were observed. The results obtained were discussed

    VLSI Architecture for Polar Codes Using Fast Fourier Transform-Like Design

    Get PDF
    Polar code is a novel and high-performance communication algorithm with the ability to theoretically achieving the Shannon limit, which has attracted increasing attention recently due to its low encoding and decoding complexity. Hardware optimization further reduces the cost and achieves better timing performance enabling real-time applications on resource-constrained devices. This thesis presents an area-efficient architecture for a successive cancellation (SC) polar decoder. Our design applies high-level transformations to reduce the number of Processing Elements (PEs), i.e., only log2 N pre-computed PEs are required in our architecture for an N-bit code. We also propose a customized loop-based shifting register to reduce the consumption of the delay elements further. Our experimental results demonstrate that our architecture reduces 98.90% and 93.38% in the area and area-time product, respectively, compared to prior works

    An Efficient, Portable and Generic Library for Successive Cancellation Decoding of Polar Codes

    Get PDF
    International audienceError Correction Code decoding algorithms for consumer products such as Internet of Things (IoT) devices are usually implemented as dedicated hardware circuits. As processors are becoming increasingly powerful and energy efficient, there is now a strong desire to perform this processing in software to reduce production costs and time to market. The recently introduced family of Successive Cancellation decoders for Polar codes has been shown in several research works to efficiently leverage the ubiquitous SIMD units in modern CPUs, while offering strong potentials for a wide range of optimizations. The P-EDGE environment introduced in this paper, combines a specialized skeleton generator and a building blocks library routines to provide a generic, extensible Polar code exploration workbench. It enables ECC code designers to easily experiments with combinations of existing and new optimizations , while delivering performance close to state-of-art decoders
    corecore