950 research outputs found
Multilevel Power Estimation Of VLSI Circuits Using Efficient Algorithms
New and complex systems are being implemented using highly advanced Electronic Design Automation (EDA) tools. As the complexity increases day by day, the dissipation of power has emerged as one of the very important design constraints. Now low power designs are not only used in small size applications like cell phones and handheld devices but also in high-performance computing applications. Embedded memories have been used extensively in modern SOC designs. In order to estimate the power consumption of the entire design correctly, an accurate memory power model is needed. However, the memory power model commonly used in commercial EDA tools is too simple to estimate the power consumption accurately. For complex digital circuits, building their power models is a popular approach to estimate their power consumption without detailed circuit information. In the literature, most of power models are built with lookup tables. However, building the power models with lookup tables may become infeasible for large circuits because the table size would increase exponentially to meet the accuracy requirement. This thesis involves two parts. In first part it uses the Synopsys power measurement tools together with the use of synthesis and extraction tools to determine power consumed by various macros at different levels of abstraction including the Register Transfer Level (RTL), the gate and the transistor level. In general, it can be concluded that as the level of abstraction goes down the accuracy of power measurement increases depending on the tool used. In second part a novel power modeling approach for complex circuits by using neural networks to learn the relationship between power dissipation and input/output characteristic vector during simulation has been developed. Our neural power model has very low complexity such that this power model can be used for complex circuits. Using such a simple structure, the neural power models can still have high accuracy because they can automatically consider the non-linear power distributions. Unlike the power characterization process in traditional approaches, our characterization process is very simple and straightforward. More importantly, using the neural power model for power estimation does not require any transistor-level or gate-level description of the circuits. The experimental results have shown that the estimations are accurate and efficient for different test sequences with wide range of input distributions
System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing
This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications.
Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance.
This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB.
Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy).
The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption.
Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude
HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis
High-level synthesis (HLS) enables designers to customize hardware designs
efficiently. However, it is still challenging to foresee the correlation
between power consumption and HLS-based applications at an early design stage.
To overcome this problem, we introduce HL-Pow, a power modeling framework for
FPGA HLS based on state-of-the-art machine learning techniques. HL-Pow
incorporates an automated feature construction flow to efficiently identify and
extract features that exert a major influence on power consumption, simply
based upon HLS results, and a modeling flow that can build an accurate and
generic power model applicable to a variety of designs with HLS. By using
HL-Pow, the power evaluation process for FPGA designs can be significantly
expedited because the power inference of HL-Pow is established on HLS instead
of the time-consuming register-transfer level (RTL) implementation flow.
Experimental results demonstrate that HL-Pow can achieve accurate power
modeling that is only 4.67% (24.02 mW) away from onboard power measurement. To
further facilitate power-oriented optimizations, we describe a novel design
space exploration (DSE) algorithm built on top of HL-Pow to trade off between
latency and power consumption. This algorithm can reach a close approximation
of the real Pareto frontier while only requiring running HLS flow for 20% of
design points in the entire design space.Comment: published as a conference paper in ASP-DAC 202
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
In recent years, the field of Deep Learning has seen many disruptive and
impactful advancements. Given the increasing complexity of deep neural
networks, the need for efficient hardware accelerators has become more and more
pressing to design heterogeneous HPC platforms. The design of Deep Learning
accelerators requires a multidisciplinary approach, combining expertise from
several areas, spanning from computer architecture to approximate computing,
computational models, and machine learning algorithms. Several methodologies
and tools have been proposed to design accelerators for Deep Learning,
including hardware-software co-design approaches, high-level synthesis methods,
specific customized compilers, and methodologies for design space exploration,
modeling, and simulation. These methodologies aim to maximize the exploitable
parallelism and minimize data movement to achieve high performance and energy
efficiency. This survey provides a holistic review of the most influential
design methodologies and EDA tools proposed in recent years to implement Deep
Learning accelerators, offering the reader a wide perspective in this rapidly
evolving field. In particular, this work complements the previous survey
proposed by the same authors in [203], which focuses on Deep Learning hardware
accelerators for heterogeneous HPC platforms
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
Methoden und Beschreibungssprachen zur Modellierung und Verifikation vonSchaltungen und Systemen: MBMV 2015 - Tagungsband, Chemnitz, 03. - 04. März 2015
Der Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV 2015) findet nun schon zum 18. mal statt. Ausrichter sind in diesem Jahr die Professur Schaltkreis- und Systementwurf der Technischen Universität Chemnitz und das Steinbeis-Forschungszentrum Systementwurf und Test.
Der Workshop hat es sich zum Ziel gesetzt, neueste Trends, Ergebnisse und aktuelle Probleme auf dem Gebiet der Methoden zur Modellierung und Verifikation sowie der Beschreibungssprachen digitaler, analoger und Mixed-Signal-Schaltungen zu diskutieren. Er soll somit ein Forum zum Ideenaustausch sein.
Weiterhin bietet der Workshop eine Plattform für den Austausch zwischen Forschung und Industrie sowie zur Pflege bestehender und zur Knüpfung neuer Kontakte. Jungen Wissenschaftlern erlaubt er, ihre Ideen und Ansätze einem breiten Publikum aus Wissenschaft und Wirtschaft zu präsentieren und im Rahmen der Veranstaltung auch fundiert zu diskutieren. Sein langjähriges Bestehen hat ihn zu einer festen Größe in vielen Veranstaltungskalendern gemacht. Traditionell sind auch die Treffen der ITGFachgruppen an den Workshop angegliedert.
In diesem Jahr nutzen zwei im Rahmen der InnoProfile-Transfer-Initiative durch das Bundesministerium für Bildung und Forschung geförderte Projekte den Workshop, um in zwei eigenen Tracks ihre Forschungsergebnisse einem breiten Publikum zu präsentieren. Vertreter der Projekte Generische Plattform für Systemzuverlässigkeit und Verifikation (GPZV) und GINKO - Generische Infrastruktur zur nahtlosen energetischen Kopplung von Elektrofahrzeugen stellen Teile ihrer gegenwärtigen Arbeiten vor. Dies bereichert denWorkshop durch zusätzliche Themenschwerpunkte und bietet eine wertvolle Ergänzung zu den Beiträgen der Autoren. [... aus dem Vorwort
Algorithmic techniques for physical design : macro placement and under-the-cell routing
With the increase of chip component density and new manufacturability constraints imposed by modern technology nodes, the role of algorithms for electronic design automation is key to the successful implementation of integrated circuits. Two of the critical steps in the physical design flows are macro placement and ensuring all design rules are honored after timing closure.
This thesis proposes contributions to help in these stages, easing time-consuming manual steps and helping physical design engineers to obtain better layouts in reduced turnaround time.
The first contribution is under-the-cell routing, a proposal to systematically connect standard cell components via lateral pins in the lower metal layers. The aim is to reduce congestion in the upper metal layers caused by extra metal and vias, decreasing the number of design rule violations. To allow cells to connect by abutment, a standard cell library is enriched with instances containing lateral pins in a pre-selected sharing track. Algorithms are proposed to maximize the numbers of connections via lateral connection by mapping placed cell instances to layouts with lateral pins, and proposing local placement modifications to increase the opportunities for such connections. Experimental results show a significant decrease in the number of pins, vias, and in number of design rule violations, with negligible impact on wirelength and timing.
The second contribution, done in collaboration with eSilicon (a leading ASIC design company), is the creation of HiDaP, a macro placement tool for modern industrial designs. The proposed approach follows a multilevel scheme to floorplan hierarchical blocks, composed of macros and standard cells. By exploiting RTL information available in the netlist, the dataflow affinity between these blocks is modeled and minimized to find a macro placement with good wirelength and timing properties. The approach is further extended to allow additional engineer input, such as preferred macro locations, and also spectral and force methods to guide the floorplanning search.
Experimental results show that the layouts generated by HiDaP outperforms those obtained by a state-of-the-art EDA physical design software, with similar wirelength and better timing when compared to manually designed tape-out ready macro placements. Layouts obtained by HiDaP have successfully been brought to near timing closure with one to two rounds of small modifications by physical design engineers. HiDaP has been fully integrated in the design flows of the company and its development remains an ongoing effort.A causa de l'increment de la densitat de components en els xip i les noves restriccions de disseny imposades pels últims nodes de fabricació, el rol de l'algorísmia en l'automatització del disseny electrònic ha esdevingut clau per poder implementar circuits integrats. Dos dels passos crucials en el procés de disseny físic és el placement de macros i assegurar la correcció de les regles de disseny un cop les restriccions de timing del circuit són satisfetes. Aquesta tesi proposa contribucions per ajudar en aquests dos reptes, facilitant laboriosos passos manuals en el procés i ajudant als enginyers de disseny físic a obtenir millors resultats en menys temps. La primera contribució és el routing "under-the-cell", una proposta per connectar cel·les estàndard usant pins laterals en les capes de metall inferior de manera sistemàtica. L'objectiu és reduir la congestió en les capes de metall superior causades per l'ús de metall i vies, i així disminuir el nombre de violacions de regles de disseny. Per permetre la connexió lateral de cel·les, estenem una llibreria de cel·les estàndard amb dissenys que incorporen connexions laterals. També proposem modificacions locals al placement per permetre explotar aquest tipus de connexions més sovint. Els resultats experimentals mostren una reducció significativa en el nombre de pins, vies i nombre de violacions de regles de disseny, amb un impacte negligible en wirelength i timing. La segona contribució, desenvolupada en col·laboració amb eSilicon (una empresa capdavantera en disseny ASIC), és el desenvolupament de HiDaP, una eina de macro placement per a dissenys industrials actuals. La proposta segueix un procés multinivell per fer el floorplan de blocks jeràrquics, formats per macros i cel·les estàndard. Mitjançant la informació RTL disponible en la netlist, l'afinitat de dataflow entre els mòduls es modela i minimitza per trobar macro placements amb bones propietats de wirelength i timing. La proposta també incorpora la possibilitat de rebre input addicional de l'enginyer, com ara suggeriments de les posicions de les macros. Finalment, també usa mètodes espectrals i de forçes per guiar la cerca de floorplans. Els resultats experimentals mostren que els dissenys generats amb HiDaP són millors que els obtinguts per eines comercials capdavanteres de EDA. Els resultats també mostren que els dissenys presentats poden obtenir un wirelength similar i millor timing que macro placements obtinguts manualment, usats per fabricació. Alguns dissenys obtinguts per HiDaP s'han dut fins a timing-closure en una o dues rondes de modificacions incrementals per part d'enginyers de disseny físic. L'eina s'ha integrat en el procés de disseny de eSilicon i el seu desenvolupament continua més enllà de les aportacions a aquesta tesi.Postprint (published version
Algorithmic techniques for physical design : macro placement and under-the-cell routing
With the increase of chip component density and new manufacturability constraints imposed by modern technology nodes, the role of algorithms for electronic design automation is key to the successful implementation of integrated circuits. Two of the critical steps in the physical design flows are macro placement and ensuring all design rules are honored after timing closure.
This thesis proposes contributions to help in these stages, easing time-consuming manual steps and helping physical design engineers to obtain better layouts in reduced turnaround time.
The first contribution is under-the-cell routing, a proposal to systematically connect standard cell components via lateral pins in the lower metal layers. The aim is to reduce congestion in the upper metal layers caused by extra metal and vias, decreasing the number of design rule violations. To allow cells to connect by abutment, a standard cell library is enriched with instances containing lateral pins in a pre-selected sharing track. Algorithms are proposed to maximize the numbers of connections via lateral connection by mapping placed cell instances to layouts with lateral pins, and proposing local placement modifications to increase the opportunities for such connections. Experimental results show a significant decrease in the number of pins, vias, and in number of design rule violations, with negligible impact on wirelength and timing.
The second contribution, done in collaboration with eSilicon (a leading ASIC design company), is the creation of HiDaP, a macro placement tool for modern industrial designs. The proposed approach follows a multilevel scheme to floorplan hierarchical blocks, composed of macros and standard cells. By exploiting RTL information available in the netlist, the dataflow affinity between these blocks is modeled and minimized to find a macro placement with good wirelength and timing properties. The approach is further extended to allow additional engineer input, such as preferred macro locations, and also spectral and force methods to guide the floorplanning search.
Experimental results show that the layouts generated by HiDaP outperforms those obtained by a state-of-the-art EDA physical design software, with similar wirelength and better timing when compared to manually designed tape-out ready macro placements. Layouts obtained by HiDaP have successfully been brought to near timing closure with one to two rounds of small modifications by physical design engineers. HiDaP has been fully integrated in the design flows of the company and its development remains an ongoing effort.A causa de l'increment de la densitat de components en els xip i les noves restriccions de disseny imposades pels últims nodes de fabricació, el rol de l'algorísmia en l'automatització del disseny electrònic ha esdevingut clau per poder implementar circuits integrats. Dos dels passos crucials en el procés de disseny físic és el placement de macros i assegurar la correcció de les regles de disseny un cop les restriccions de timing del circuit són satisfetes. Aquesta tesi proposa contribucions per ajudar en aquests dos reptes, facilitant laboriosos passos manuals en el procés i ajudant als enginyers de disseny físic a obtenir millors resultats en menys temps. La primera contribució és el routing "under-the-cell", una proposta per connectar cel·les estàndard usant pins laterals en les capes de metall inferior de manera sistemàtica. L'objectiu és reduir la congestió en les capes de metall superior causades per l'ús de metall i vies, i així disminuir el nombre de violacions de regles de disseny. Per permetre la connexió lateral de cel·les, estenem una llibreria de cel·les estàndard amb dissenys que incorporen connexions laterals. També proposem modificacions locals al placement per permetre explotar aquest tipus de connexions més sovint. Els resultats experimentals mostren una reducció significativa en el nombre de pins, vies i nombre de violacions de regles de disseny, amb un impacte negligible en wirelength i timing. La segona contribució, desenvolupada en col·laboració amb eSilicon (una empresa capdavantera en disseny ASIC), és el desenvolupament de HiDaP, una eina de macro placement per a dissenys industrials actuals. La proposta segueix un procés multinivell per fer el floorplan de blocks jeràrquics, formats per macros i cel·les estàndard. Mitjançant la informació RTL disponible en la netlist, l'afinitat de dataflow entre els mòduls es modela i minimitza per trobar macro placements amb bones propietats de wirelength i timing. La proposta també incorpora la possibilitat de rebre input addicional de l'enginyer, com ara suggeriments de les posicions de les macros. Finalment, també usa mètodes espectrals i de forçes per guiar la cerca de floorplans. Els resultats experimentals mostren que els dissenys generats amb HiDaP són millors que els obtinguts per eines comercials capdavanteres de EDA. Els resultats també mostren que els dissenys presentats poden obtenir un wirelength similar i millor timing que macro placements obtinguts manualment, usats per fabricació. Alguns dissenys obtinguts per HiDaP s'han dut fins a timing-closure en una o dues rondes de modificacions incrementals per part d'enginyers de disseny físic. L'eina s'ha integrat en el procés de disseny de eSilicon i el seu desenvolupament continua més enllà de les aportacions a aquesta tesi
- …