569 research outputs found

    Design and analysis of multiple read port techniques using bank division with XOR method for multi-ported-memory on FPGA platform

    Get PDF
    The multiple read and write operations are performed simultaneously by multi-ported memories and are used in advanced digital design applications on reprogrammable field-programmable gate arrays (FPGAs) to achieve higher bandwidth. The Memory modules are configured by block RAM (BRAMs), which utilizes more area and power on FPGA. In this manuscript, the techniques to increase the read ports for multi-ported memory modules are designed using the bank division with XOR (BDX) approach. The read port techniques like two read-one write (2R1W) memory, hybrid mode approach either 2R1W or 4R memory, and hierarchical BDX (HBDX) Approach using 2R1W/4R memory are designed on FPGA platform. The Proposed work utilizes only slices and look-up table (LUT's) rather than BRAMs while designing the memory modules on FPGA, which reduces the computational complexity and improves the system performance.  The experimental results are analyzed on Artix-7 FPGA. The performance parameters like slices, LUT utilization, maximum frequency (Fmax), and hardware efficiency are analyzed by concerning different memory depths. The 4R1W memory design using the HBDX approach utilizes 4% slices and works at 449.697 MHz operating frequency on Artix-7 FPGA. The proposed work provides a better platform to choose the proper read port technique to design an efficient modular multiport memory architecture

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    CABAC accelerator architectures for video compression in future multimedida : a survey

    Get PDF
    The demands for high quality, real-time performance and multi-format video support in consumer multimedia products are ever increasing. In particular, the future multimedia systems require efficient video coding algorithms and corresponding adaptive high-performance computational platforms. The H.264/AVC video coding algorithms provide high enough compression efficiency to be utilized in these systems, and multimedia processors are able to provide the required adaptability, but the algorithms complexity demands for more efficient computing platforms. Heterogeneous (re-)configurable systems composed of multimedia processors and hardware accelerators constitute the main part of such platforms. In this paper, we survey the hardware accelerator architectures for Context-based Adaptive Binary Arithmetic Coding (CABAC) of Main and High profiles of H.264/AVC. The purpose of the survey is to deliver a critical insight in the proposed solutions, and this way facilitate further research on accelerator architectures, architecture development methods and supporting EDA tools. The architectures are analyzed, classified and compared based on the core hardware acceleration concepts, algorithmic characteristics, video resolution support and performance parameters, and some promising design directions are discussed. The comparative analysis shows that the parallel pipeline accelerator architecture seems to be the most promising

    A FLEXIBLE HIGH-BANDWIDTH LOW-LATENCY MULTI-PORT MEMORY CONTROLLER

    Get PDF
    Multi-port memory controllers (MPMCs) have become increasingly important in many modern applications due to the tremendous growth in bandwidth requirement. Many approaches so far have focused on improving either the memory access latency or the bandwidth utilization for specific applications. Moreover, the application systems are likely to require certain adjustments to connect with an MPMC, since the MPMC interface is limited to a single-clock and single-data-width domain. In this paper, we propose efficient techniques to improve the flexibility, latency, and bandwidth of an MPMC. Firstly, MPMC interfaces employ a pair of dual-clock dualport FIFOs at each port, so any multi-clock multi-data-width application system can connect to an MPMC without requiring extra resources. Secondly, memory access latency is significantly reduced because parallel FIFOs temporarily keep the data transfer between the application system and memory. Lastly, a proposed arbitration scheme, namely window-based first-come-first-serve, considerably enhances the bandwidth utilization. Depending on the applications, MPMC can be properly configured by updating several internal configuration registers. The experimental results in an Altera Cyclone V FPGA prove that MPMC is fully operational at 150 MHz and supports up to 32 concurrent connections at various clocks and data widths. More significantly, achieved bandwidth utilization is approximately 93.2% of the theoretical bandwidth, and the access latency is minimized as compared to previous designs

    Baseband analog front-end and digital back-end for reconfigurable multi-standard terminals

    Get PDF
    Multimedia applications are driving wireless network operators to add high-speed data services such as Edge (E-GPRS), WCDMA (UMTS) and WLAN (IEEE 802.11a,b,g) to the existing GSM network. This creates the need for multi-mode cellular handsets that support a wide range of communication standards, each with a different RF frequency, signal bandwidth, modulation scheme etc. This in turn generates several design challenges for the analog and digital building blocks of the physical layer. In addition to the above-mentioned protocols, mobile devices often include Bluetooth, GPS, FM-radio and TV services that can work concurrently with data and voice communication. Multi-mode, multi-band, and multi-standard mobile terminals must satisfy all these different requirements. Sharing and/or switching transceiver building blocks in these handsets is mandatory in order to extend battery life and/or reduce cost. Only adaptive circuits that are able to reconfigure themselves within the handover time can meet the design requirements of a single receiver or transmitter covering all the different standards while ensuring seamless inter-interoperability. This paper presents analog and digital base-band circuits that are able to support GSM (with Edge), WCDMA (UMTS), WLAN and Bluetooth using reconfigurable building blocks. The blocks can trade off power consumption for performance on the fly, depending on the standard to be supported and the required QoS (Quality of Service) leve

    Digital signal processor fundamentals and system design

    Get PDF
    Digital Signal Processors (DSPs) have been used in accelerator systems for more than fifteen years and have largely contributed to the evolution towards digital technology of many accelerator systems, such as machine protection, diagnostics and control of beams, power supply and motors. This paper aims at familiarising the reader with DSP fundamentals, namely DSP characteristics and processing development. Several DSP examples are given, in particular on Texas Instruments DSPs, as they are used in the DSP laboratory companion of the lectures this paper is based upon. The typical system design flow is described; common difficulties, problems and choices faced by DSP developers are outlined; and hints are given on the best solution

    Survey of advanced CABAC accelarator architectures for future multimedia.

    Get PDF
    The future high quality multimedia systems require efficient video coding algorithms and corresponding adaptive high-performance computational platforms. In this paper, we survey the hardware accelerator architectures for Context-based Adaptive Binary Arithmetic Coding (CABAC) of H.264/AVC. The purpose of the survey is to deliver a critical insight in the proposed solutions, and this way facilitate further research on accelerator architectures, architecture development methods and supporting EDA tools. The architectures are analyzed, classified and compared based on the core hardware acceleration concepts, algorithmic characteristics, video resolution support and performance parameters, and some promising design directions are discussed

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2
    corecore