20 research outputs found

    Conflict resolution for pipelined layered LDPC decoders

    No full text
    International audienceMany of the current LDPC implementations of DVB-S2, T2 or WiMAX standard use the so-called layered architecture combined with pipeline. However, the pipeline process may introduce memory access conflicts. The resolution of these conflicts requires careful scheduling combined with dedicated hardware and/or idle cycle insertion. In this paper, based on the DVB-T2 example, we explain explicitly how the scheduling can solve most of the pipeline conflicts. The two contributions of the paper are 1) how to split the matrix to relax the pipeline conflicts at a cost of a reduced maximum available parallelism 2) how to project the problem of the research of an efficient scheduling to the well-known "Travelling Salesman Problem" and use a genetic algorithm to solve it

    LDPC decoder architecture for DVB-S2 and DVB-S2X standards

    No full text
    International audienceA particular type of conflict due to multiple-diagonal sub-matrices in the DVB-S2 parity-check matrices is known to complicate the implementation of the layered decoder architecture. The new matrices proposed in DVB-S2X no longer use such sub-matrices. For implementing a decoder compliant both with DVB-S2 and DVB-S2X, we propose an elegant solution which overcomes this conflicts relying on an efficient write disable of the memories, allowing a straightforward implementation of layered LDPC decoders. The complexity and latency are further reduced by eliminating one barrel shifter. Compared with the existing solutions, complexity is reduced without performance degradation. Keywords—Low-Density Parity-Check (LDPC) code, memory conflict, layered decoder, DVB-S2, DVB-S2X

    HIGH-SPEED CONFLICT-FREE LAYERED LDPC DECODER FOR THE DVB-S2, -T2 AND -C2 STANDARDS

    No full text
    International audienceLayered decoding is known to provide efficient and highthroughput implementation of LDPC decoders. However, the implementation of layered architecture is not always straightforward because of memory update conflicts in the a posteriori information memory. In this paper, we focus our attention on a particular type of conflict that is due to multiple-diagonal sub-matrices in the DVB-S2, -T2 and -C2 parity-check matrices. We propose an original solution that combines repetition of the concerned layers and the write disable of the a posteriori information memory. The implementation of this solution on an FPGA-based LDPC decoder led to an average air throughput of 200 Mbit/s with a parallelism of 45 and a clock frequency of 300 MHz. Increasing the parallelism to 120 led to an average air throughput of 720 Mbit/s with a clock frequency of 400 MHz on CMOS technology

    Conflict Resolution by Matrix Reordering for DVB-T2 LDPC Decoders

    No full text
    International audienceLayered decoding is known to provide efficient and high-throughput implementation of LDPC decoders. However, the implementation of the layered architecture is not always straightforward because of the memory access conflicts in the a-posteriori information memory. In this paper, we focus our attention on a particular type of conflict introduced by the existence of multiple diagonal matrices in the DVB-T2 parity check matrix structure. We illustrate how the reordering of the matrix reduces the number of conflicts, at the cost of limiting the level of parallelism. We then propose a parity extending process to solve the remaining conflicts. Fixed point simulation results show coherent performance without modifying the layered architecture

    A Novel Conflict-Free Memory and Processor Architecture for DVB-T2 LDPC Decoding

    Get PDF
    In this paper, we present a flexible architecture for an LDPC decoder that fully exploits the structure of the codes defined in the DVB-T2 standard (Digital Video Broadcasting - Second Generation Terrestrial). We propose a processor and memory architecture which uses the flooding schedule and has no memory access conflicts, which are encountered in serial schedule decoders proposed in the literature. Thus, unlike previous works, we do not require any extra logic or ad hoc designs to resolve memory conflicts. Despite the typically slower convergence of flooding schedule compared to serial schedule decoders, our ar- chitecture meets the throughput and BER requirements specified in the DVB-T2 standard. Our design allows a trade-off between memory size and performance by the selection of the number of bits per message without affecting the general memory arrangement. Besides, our architecture is not algorithm specific: any check-node message processing algorithm can be used (Sum-Product, Min-Sum, etc.) without modifying the basic architecture. Furthermore, by simply adding relevant small ROM tables, we get a decoder that is fully compatible with all three second generation DVB standards (DVB-T2, DVB-S2 and DVB-C2). We present simulation results to demonstrate the viability of our solution both functionally and in terms of the bit-error rate performance. We also discuss the memory requirements and the throughput of the architecture, and present preliminary synthesis results in CMOS 130nm technology

    Flexible encoder and decoder of low density parity check codes

    Get PDF
    У дисертацији су предложена брза, флексибилна и хардверски ефикасна решења за кодовање и декодовање изузетно нерегуларних кодова са проверама парности мале густине (енгл. low-density parity-check, LDPC, codes) захтевана у савременим комуникационим стандардима. Један део доприноса дисертације је у новој делимично паралелној архитектури LDPC кодера за пету генерацију мобилних комуникација. Архитектура је заснована на флексибилној мрежи за кружни померај која омогућава паралелно процесирање више делова контролне матрице кратких кодова чиме се остварује сличан ниво паралелизма као и при кодовању дугачких кодова. Поред архитектуралног решења, предложена је оптимизација редоследа процесирања контролне матрице заснована на генетичком алгоритму, која омогућава постизање великих протока, малог кашњења и тренутно најбоље ефикасности искоришћења хардверских ресурса. У другом делу дисертације предложено је ново алгоритамско и архитектурално решење за декодовање структурираних LDPC кодова. Често коришћени приступ у LDPC декодерима је слојевито декодовање, код кога се услед проточне обраде јављају хазарди података који смањују проток. Декодер предложен у дисертацији у конфликтним ситуацијама на погодан начин комбинује слојевито и симултано декодовање чиме се избегавају циклуси паузе изазвани хазардима података. Овај приступ даје могућност за увођење великог броја степени проточне обраде чиме се постиже висока учестаност сигнала такта. Додатно, редослед процесирања контролне матрице је оптимизован коришћењем генетичког алгоритма за побољшане перформансе контроле грешака. Остварени резултати показују да, у поређењу са референтним решењима, предложени декодер остварује значајна побољшања у протоку и најбољу ефикасност за исте перформансе контроле грешака.The dissertation proposes high speed, flexible and hardware efficient solutions for coding and decoding of highly irregular low-density parity-check (LDPC) codes, required by many modern communication standards. The first part of the dissertation’s contributions is in the novel partially parallel LDPC encoder architecture for 5G. The architecture was built around the flexible shifting network that enables parallel processing of multiple parity check matrix elements for short to medium code lengths, thus providing almost the same level of parallelism as for long code encoding. In addition, the processing schedule was optimized for minimal encoding time using the genetic algorithm. The optimization procedure contributes to achieving high throughputs, low latency, and up to date the best hardware usage efficiency (HUE). The second part proposes a new algorithmic and architectural solution for structured LDPC code decoding. A widely used approach in LDPC decoders is a layered decoding schedule, which frequently suffers from pipeline data hazards that reduce the throughput. The decoder proposed in the dissertation conveniently incorporates both the layered and the flooding schedules in cases when hazards occur and thus facilitates LDPC decoding without stall cycles caused by pipeline hazards. Therefore, the proposed architecture enables insertion of many pipeline stages, which consequently provides a high operating clock frequency. Additionally, the decoding schedule was optimized for better signal-to-noise ratio (SNR) performance using genetic algorithm. The obtained results show that the proposed decoder achieves great throughput increase and the best HUE when compared with the state of the art for the same SNR performance

    Energy-Efficient Computing for Mobile Signal Processing

    Full text link
    Mobile devices have rapidly proliferated, and deployment of handheld devices continues to increase at a spectacular rate. As today's devices not only support advanced signal processing of wireless communication data but also provide rich sets of applications, contemporary mobile computing requires both demanding computation and efficiency. Most mobile processors combine general-purpose processors, digital signal processors, and hardwired application-specific integrated circuits to satisfy their high-performance and low-power requirements. However, such a heterogeneous platform is inefficient in area, power and programmability. Improving the efficiency of programmable mobile systems is a critical challenge and an active area of computer systems research. SIMD (single instruction multiple data) architectures are very effective for data-level-parallelism intense algorithms in mobile signal processing. However, new characteristics of advanced wireless/multimedia algorithms require architectural re-evaluation to achieve better energy efficiency. Therefore, fourth generation wireless protocol and high definition mobile video algorithms are analyzed to enhance a wide-SIMD architecture. The key enhancements include 1) programmable crossbar to support complex data alignment, 2) SIMD partitioning to support fine-grain SIMD computation, and 3) fused operation to support accelerating frequently used instruction pairs. Near-threshold computation has been attractive in low-power architecture research because it balances performance and power. To further improve energy efficiency in mobile computing, near-threshold computation is applied to a wide SIMD architecture. This proposed near-threshold wide SIMD architecture-Diet SODA-presents interesting architectural design decisions such as 1) very wide SIMD datapath to compensate for degraded performance induced by near-threshold computation and 2) scatter-gather data prefetcher to exploit large latency gap between memory and the SIMD datapath. Although near-threshold computation provides excellent energy efficiency, it suffers from increased delay variations. A systematic study of delay variations in near-threshold computing is performed and simple techniques-structural duplication and voltage/frequency margining-are explored to tolerate and mitigate the delay variations in near-threshold wide SIMD architectures. This dissertation analyzes representative wireless/multimedia mobile signal processing algorithms, proposes an energy-efficient programmable platform, and evaluates performance and power. A main theme of this dissertation is that the performance and efficiency of programmable embedded systems can be significantly improved with a combination of parallel SIMD and near-threshold computations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86356/1/swseo_1.pd

    Hardware-Conscious Wireless Communication System Design

    Get PDF
    The work at hand is a selection of topics in efficient wireless communication system design, with topics logically divided into two groups.One group can be described as hardware designs conscious of their possibilities and limitations. In other words, it is about hardware that chooses its configuration and properties depending on the performance that needs to be delivered and the influence of external factors, with the goal of keeping the energy consumption as low as possible. Design parameters that trade off power with complexity are identified for analog, mixed signal and digital circuits, and implications of these tradeoffs are analyzed in detail. An analog front end and an LDPC channel decoder that adapt their parameters to the environment (e.g. fluctuating power level due to fading) are proposed, and it is analyzed how much power/energy these environment-adaptive structures save compared to non-adaptive designs made for the worst-case scenario. Additionally, the impact of ADC bit resolution on the energy efficiency of a massive MIMO system is examined in detail, with the goal of finding bit resolutions that maximize the energy efficiency under various system setups.In another group of themes, one can recognize systems where the system architect was conscious of fundamental limitations stemming from hardware.Put in another way, in these designs there is no attempt of tweaking or tuning the hardware. On the contrary, system design is performed so as to work around an existing and unchangeable hardware limitation. As a workaround for the problematic centralized topology, a massive MIMO base station based on the daisy chain topology is proposed and a method for signal processing tailored to the daisy chain setup is designed. In another example, a large group of cooperating relays is split into several smaller groups, each cooperatively performing relaying independently of the others. As cooperation consumes resources (such as bandwidth), splitting the system into smaller, independent cooperative parts helps save resources and is again an example of a workaround for an inherent limitation.From the analyses performed in this thesis, promising observations about hardware consciousness can be made. Adapting the structure of a hardware block to the environment can bring massive savings in energy, and simple workarounds prove to perform almost as good as the inherently limited designs, but with the limitation being successfully bypassed. As a general observation, it can be concluded that hardware consciousness pays off

    Architecture and Analysis for Next Generation Mobile Signal Processing.

    Full text link
    Mobile devices have proliferated at a spectacular rate, with more than 3.3 billion active cell phones in the world. With sales totaling hundreds of billions every year, the mobile phone has arguably become the dominant computing platform, replacing the personal computer. Soon, improvements to today’s smart phones, such as high-bandwidth internet access, high-definition video processing, and human-centric interfaces that integrate voice recognition and video-conferencing will be commonplace. Cost effective and power efficient support for these applications will be required. Looking forward to the next generation of mobile computing, computation requirements will increase by one to three orders of magnitude due to higher data rates, increased complexity algorithms, and greater computation diversity but the power requirements will be just as stringent to ensure reasonable battery lifetimes. The design of the next generation of mobile platforms must address three critical challenges: efficiency, programmability, and adaptivity. The computational efficiency of existing solutions is inadequate and straightforward scaling by increasing the number of cores or the amount of data-level parallelism will not suffice. Programmability provides the opportunity for a single platform to support multiple applications and even multiple standards within each application domain. Programmability also provides: faster time to market as hardware and software development can proceed in parallel; the ability to fix bugs and add features after manufacturing; and, higher chip volumes as a single platform can support a family of mobile devices. Lastly, hardware adaptivity is necessary to maintain efficiency as the computational characteristics of the applications change. Current solutions are tailored specifically for wireless signal processing algorithms, but lose their efficiency when other application domains like high definition video are processed. This thesis addresses these challenges by presenting analysis of next generation mobile signal processing applications and proposing an advanced signal processing architecture to deal with the stringent requirements. An application-centric design approach is taken to design our architecture. First, a next generation wireless protocol and high definition video is analyzed and algorithmic characterizations discussed. From these characterizations, key architectural implications are presented, which form the basis for the advanced signal processor architecture, AnySP.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86344/1/mwoh_1.pd
    corecore