26 research outputs found

    A Reconfigurable Outer Modem Platform for Future Communications Systems

    Get PDF
    Future mobile and wireless communications networks require flexible modem architectures with high performance. Efficient utilization of application specific flexibility is key to fulfill these requirements. For high throughput a single processor can not provide the necessary computational power. Hence multi-processor architectures become necessary. This paper presents a multi-processor platform based on a new dynamically reconfigurable application specific instruction set processor (dr-ASIP) for the application domain of channel decoding. Inherently parallel decoding tasks can be mapped onto individual processing nodes. The implied challenging inter-processor communication is efficiently handled by a Network-on-Chip (NoC) such that the throughput of each node is not degraded. The dr-ASIP features Viterbi and Log-MAP decoding for support of convolutional and turbo codes of more than 10 currently specified mobile and wireless standards. Furthermore, its flexibility allows for adaptation to future systems

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    The implementation of an LDPC decoder in a Network on Chip environment

    Get PDF
    The proposed project takes origin from a cooperation initiative named NEWCOM++ among research groups to develop 3G wireless mobile system. This work, in particular, tries to focuse on the communication errors arising on a message signal characterized by working under WiMAX 802.16e standard. It will be shown how this last wireless generation protocol needs a specific flexible instrumentation and why an LDPC error correction code suitable in order to respect the quality restrictions. A chapter will be dedicated to describe, not from a mathematical point of view, the LDPC algorithm theory and how it can be graphically represented to better organize the decodification process. The main objective of this work is to validate the PHAL-concept when addressing a complex and computationally intensive design like the LDPC encoder/decoder. The expected results should be both conceptual; identifying the lacks on the PHAL concept when addressing a real problem; and second to determine the overhead introduced by PHAL in the implementation of a LDPC decoder. The mission is to build a NoC (Network on Chip) able to perform the same task of a general purpose processor, but in less time and with better efficiency, in terms of component flexibility and throughput. The single element of the network is a basic processor element (PE) formed by the union of two separated components: a special purpose processor ASIP, the responsible of the input data LDPC decoding, and the router component PHAL, checking incoming data packets and scanning the temporization of tasks execution. Supported by a specific programming tool, the ASIP has been completely designed, from the architecture resources to the instruction set, through a language like C. Realized in this SystemC code and converted in VHDL language, it's been synthesized as to fit onto an FPGA of the Xilinx Virtex-5 family. Although the main purpose regards the making of an application as flexible as possible, a WiMAX-orientated LDPC implemented on a FPGA saves space and resources, choosing the one that best suits the project synthesis. This is because encoders and decoders will have to find room in the communication tools (e.g. modems) as best as possible. The whole network scenary has been mounted through a Linux application, acting as a master element. The entire environment will require the use of VPI libraries and components able to manage the communication protocols and interfacing mechanisms

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    The implementation of an LDPC decoder in a Network on Chip environment

    Get PDF
    The proposed project takes origin from a cooperation initiative named NEWCOM++ among research groups to develop 3G wireless mobile system. This work, in particular, tries to focuse on the communication errors arising on a message signal characterized by working under WiMAX 802.16e standard. It will be shown how this last wireless generation protocol needs a specific flexible instrumentation and why an LDPC error correction code suitable in order to respect the quality restrictions. A chapter will be dedicated to describe, not from a mathematical point of view, the LDPC algorithm theory and how it can be graphically represented to better organize the decodification process. The main objective of this work is to validate the PHAL-concept when addressing a complex and computationally intensive design like the LDPC encoder/decoder. The expected results should be both conceptual; identifying the lacks on the PHAL concept when addressing a real problem; and second to determine the overhead introduced by PHAL in the implementation of a LDPC decoder. The mission is to build a NoC (Network on Chip) able to perform the same task of a general purpose processor, but in less time and with better efficiency, in terms of component flexibility and throughput. The single element of the network is a basic processor element (PE) formed by the union of two separated components: a special purpose processor ASIP, the responsible of the input data LDPC decoding, and the router component PHAL, checking incoming data packets and scanning the temporization of tasks execution. Supported by a specific programming tool, the ASIP has been completely designed, from the architecture resources to the instruction set, through a language like C. Realized in this SystemC code and converted in VHDL language, it's been synthesized as to fit onto an FPGA of the Xilinx Virtex-5 family. Although the main purpose regards the making of an application as flexible as possible, a WiMAX-orientated LDPC implemented on a FPGA saves space and resources, choosing the one that best suits the project synthesis. This is because encoders and decoders will have to find room in the communication tools (e.g. modems) as best as possible. The whole network scenary has been mounted through a Linux application, acting as a master element. The entire environment will require the use of VPI libraries and components able to manage the communication protocols and interfacing mechanisms

    A Probabilistic Approach for the System-Level Design of Multi-ASIP Platforms

    Get PDF

    Reconfigurable Instruction Cell Architecture Reconfiguration and Interconnects

    Get PDF

    Design and development from single core reconfigurable accelerators to a heterogeneous accelerator-rich platform

    Get PDF
    The performance of a platform is evaluated based on its ability to deal with the processing of multiple applications of different nature. In this context, the platform under evaluation can be of homogeneous, heterogeneous or of hybrid architecture. The selection of an architecture type is generally based on the set of different target applications and performance parameters, where the applications can be of serial or parallel nature. The evaluation is normally based on different performance metrics, e.g., resource/area utilization, execution time, power and energy consumption. This process can also include high-level performance metrics, e.g., Operations Per Second (OPS), OPS/Watt, OPS/Hz, Watt/Area etc. An example of architecture selection can be related to a wireless communication system where the processing of computationally-intensive signal-processing algorithms has strict execution-time constraints and in this case, a platform with special-purpose accelerators is relatively more suitable than a typical homogeneous platform. A couple of decades ago, it was expensive to plant many special-purpose accelerators on a chip as the cost per unit area was relatively higher than today. The utilization wall is also becoming a limiting factor in homogeneous multicore scaling which means that all the cores on a platform cannot be operated at their maximum frequency due to a possible thermal meltdown. In this case, some of the processing cores have to be turned-off or to be operated at very low frequencies making most of the part of the chip to stay underutilized. A possible solution lies in the use of heterogeneous multicore platforms where many application-specific cores operate at lower frequencies, therefore reducing power dissipation density and increasing other performance parameters. However, to achieve maximum flexibility in processing, a general-purpose flavor can also be introduced by adding a few Reduced Instruction-Set Computing (RISC) cores. A power class of heterogeneous multicore platforms is an accelerator-rich platform where many application-specific accelerators are loosely connected with each other for work load distribution or to execute the tasks independently. This research work spans from the design and development of three different types of template-based Coarse-Grain Reconfigurable Arrays (CGRAs), i.e., CREMA, AVATAR and SCREMA to a Heterogeneous Accelerator-Rich Platform (HARP). The accelerators generated from the three CGRAs could perform different lengths and types of Fast Fourier Transform (FFT), real and complex Matrix-Vector Multiplication (MVM) algorithms. CREMA and AVATAR were fixed CGRAs with eight and sixteen number of Processing Element (PE) columns, respectively. SCREMA could flex between four, eight, sixteen and thirty two number of PE columns. Many case studies were conducted to evaluate the performance of the reconfigurable accelerators generated from these CGRA templates. All of these CGRAs work in a processor/coprocessor model tightly integrated with a Direct Memory Access (DMA) device. Apart from these platforms, a reconfigurable Application-Specific Instruction-set Processor (rASIP) is also designed, tested for FFT execution under IEEE-802.11n timing constraints and evaluated against a processor/coprocessor model. It was designed by integrating AVATAR generated radix-(2, 4) FFT accelerator into the datapath of a RISC processor. The instruction set of the RISC processor was extended to perform additional operations related to AVATAR. As mentioned earlier, the underutilized part of the chip, now-a-days called Dark Silicon is posing many challenges for the designers. Apart from software optimizations, clock gating, dynamic voltage/frequency scaling and other high-level techniques, one way of dealing with this problem is to use many application-specific cores. In an effort to maximize the number of reconfigurable processing resources on a platform, the accelerator-rich architecture HARP was designed and evaluated in terms of different performance metrics. HARP is constructed on a Network-on-Chip (NoC) of 3x3 nodes where with every node, a CGRA of application-specific size is integrated other than the central node which is attached to a RISC processor. The RISC establishes synchronization between the nodes for data transfer and also performs the supervisory control. While using the NoC as the backbone of communication between the cores, it becomes possible for all the cores to address each other and also perform execution simultaneously and independently of each other. The performance of accelerators generated from CREMA, AVATAR and SCREMA templates were evaluated individually and also when attached to HARP's NoC nodes. The individual CGRAs show promising results in their own capacity but when integrated all together in the framework of HARP, interesting comparisons were established in terms of overall execution times, resource utilization, operating frequencies, power and energy consumption. In evaluating HARP, estimates and measurements were also made in some advanced performance metrics, e.g., in MOPS/mW and MOPS/MHz. The overall research work promotes the idea of heterogeneous accelerator-rich platform as a solution to current problems and future needs of industry and academia

    Realizing Software Defined Radio - A Study in Designing Mobile Supercomputers.

    Full text link
    The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult to verify. A programmable hardware platform capable of supporting software implementations of the physical layer, or Software Defined Radio (SDR), has a number of advantages. These include support for multiple protocols, faster time-to-market, higher chip volumes, and support for late implementation changes. The challenge is to achieve this under the power budget of a mobile device. Wireless communications belong to an emerging class of applications with the processing requirements of a supercomputer but the power constraints of a mobile device -- mobile supercomputing. This thesis presents a set of design proposals for building a programmable wireless communication solution. In order to design a solution that can meet the lofty requirements of SDR, this thesis takes an application-centric design approach -- evaluate and optimize all aspects of the design based on the characteristics of wireless communication protocols. This includes a DSP processor architecture optimized for wireless baseband processing, wireless algorithm optimizations, and language and compilation tool support for the algorithm software and the processor hardware. This thesis first analyzes the software characteristics of SDR. Based on the analysis, this thesis proposes the Signal-Processing On-Demand Architecture (SODA), a fully programmable multi-core architecture that can support the computation requirements of third generation wireless protocols, while operating within the power budget of a mobile device. This thesis then presents wireless algorithm implementations and optimizations for the SODA processor architecture. A signal processing language extension (SPEX) is proposed to help the software development efforts of wireless communication protocols on SODA-like multi-core architecture. And finally, the SPIR compiler is proposed to automatically map SPEX code onto the multi-core processor hardware.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61760/1/linyz_1.pd
    corecore