3,038 research outputs found

    Test exploration and validation using transaction level models

    Get PDF
    The complexity of the test infrastructure and test strategies in systems-on-chip approaches the complexity of the functional design space. This paper presents test design space exploration and validation of test strategies and schedules using transaction level models (TLMs). Since many aspects of testing involve the transfer of a significant amount of test stimuli and responses, the communication-centric view of TLMs suits this purpose exceptionally wel

    Efficiency analysis methodology of FPGAs based on lost frequencies, area and cycles

    Get PDF
    We propose a methodology to study and to quantify efficiency and the impact of overheads on runtime performance. Most work on High-Performance Computing (HPC) for FPGAs only studies runtime performance or cost, while we are interested in how far we are from peak performance and, more importantly, why. The efficiency of runtime performance is defined with respect to the ideal computational runtime in absence of inefficiencies. The analysis of the difference between actual and ideal runtime reveals the overheads and bottlenecks. A formal approach is proposed to decompose the efficiency into three components: frequency, area and cycles. After quantification of the efficiencies, a detailed analysis has to reveal the reasons for the lost frequencies, lost area and lost cycles. We propose a taxonomy of possible causes and practical methods to identify and quantify the overheads. The proposed methodology is applied on a number of use cases to illustrate the methodology. We show the interaction between the three components of efficiency and show how bottlenecks are revealed

    A toolset for the analysis and optimization of motion estimation algorithms and processors

    Get PDF

    Register-transfer-level power profiling for system-on-chip power distribution network design and signoff

    Get PDF
    Abstract. This thesis is a study of how register-transfer-level (RTL) power profiling can help the design and signoff of power distribution network in digital integrated circuits. RTL power profiling is a method which collects RTL power estimation results to a single power profile which then can be analysed in order to find interesting time windows for specifying power distribution network design and signoff. The thesis starts with theory part. Complementary metal-oxide semiconductor (CMOS) inverter power dissipation is studied at first. Next, power distribution network structure and voltage drop problems are introduced. Voltage drop is demonstrated by using power distribution network impedance figures. Common on-chip power distribution network structure is introduced, and power distribution network design flow is outlined. Finally, decoupling capacitors function and impact on power distribution network impedance are thoroughly explained. The practical part of the thesis contains RTL power profiling flow details and power profiling flow results for one simulation case in one design block. Also, some methods of improving RTL power estimation accuracy are discussed and calibration with extracted parasitic is then used to get new set of power profiling time windows. After the results are presented, overall RTL power estimation accuracy is analysed and resulted time windows are compared to reference gate-level time windows. RTL power profiling result analysis shows that resulted time windows match the theory and RTL power profiling seems to be a promising method for finding time windows for power distribution network design and signoff.Rekisterisiirtotason tehoprofilointi jÀrjestelmÀpiirin tehonsiirtoverkon suunnittelussa ja verifioinnissa. TiivistelmÀ. TÀssÀ työssÀ tutkitaan, miten rekisterisiirtotason (RTL) tehoprofilointi voi auttaa digitaalisten integroitujen piirien tehonsiirtoverkon suunnittelussa ja verifioinnissa. RTL-tehoprofilointi on menetelmÀ, joka analysoi RTL-tehoestimoinnista saadusta tehokÀyrÀstÀ hyödyllisiÀ aikaikkunoita tehonsiirtoverkon suunnitteluun ja verifiointiin. Työ alkaa teoriaosuudella, jonka aluksi selitetÀÀn, miten CMOS-invertteri kuluttaa tehoa. Seuravaksi esitellÀÀn tehonsiirtoverkon rakenne ja pahimmat tehonsiirtoverkon jÀnnitehÀviön aiheuttajat. JÀnnitehÀviötÀ havainnollistetaan myös piirikaavioiden ja impedanssikÀyrien avustuksella. LisÀksi integroidun piirin tehonsiirtoverkon suunnitteluvuo ja yleisin rakenne on esitelty. Lopuksi teoriaosuus kÀsittelee yksityiskohtaisesti ohituskondensaattoreiden toiminnan ja vaikutuksen tehonsiirtoverkon kokonaisimpedanssiin. Työn kokeellisessa osuudessa esitellÀÀn ensin tehoprofiloinnin vuo ja sen jÀlkeen vuon tulokset yhdelle esimerkkilohkolle yhdessÀ simulaatioajossa. LisÀksi tÀssÀ osiossa kÀsitellÀÀn RTL-tehoestimoinnin tarkkuutta ja tehdÀÀn RTL-tehoprofilointi loisimpedansseilla kalibroidulle RTL-mallille. Lopuksi RTL-tehoestimoinnin tuloksia ja saatuja RTL-tehoprofiloinnin aikaikkunoita analysoidaan ja verrataan porttitason mallin tuloksiin. RTL-tehoprofiloinnin tulosten analysointi osoittaa, ettÀ saatavat aikaikkunat vastaavat teoriaa ja ettÀ RTL-tehoprofilointi nÀyttÀÀ lupaavalta menetelmÀltÀ tehosiirtoverkon analysoinnin ja verifioinnin aikaikkunoiden löytÀmiseen

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Get PDF
    Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm2^2. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

    Low Power Design Methodology

    Get PDF
    Due to widespread application of portable electronic devices and the evaluation of microelectronic technology, power dissipation has become a critical parameter in low power VLSI circuit designs. In emerging VLSI technology, the circuit complexity and high speed imply significant increase in the power consumption. In low power CMOS VLSI circuits, the energy dissipation is caused by charging and discharging of internal node capacitances due to transition activity, which is one of the major factors that also affect the dynamic power dissipation. The reduction in power, area and the improvement of speed require optimization at all levels of design procedures. Here various design methodologies are discussed to achieve our required low power design concepts
    • 

    corecore