240 research outputs found

    Architectural level delay and leakage power modelling of manufacturing process variation

    Get PDF
    PhD ThesisThe effect of manufacturing process variations has become a major issue regarding the estimation of circuit delay and power dissipation, and will gain more importance in the future as device scaling continues in order to satisfy market place demands for circuits with greater performance and functionality per unit area. Statistical modelling and analysis approaches have been widely used to reflect the effects of a variety of variational process parameters on system performance factor which will be described as probability density functions (PDFs). At present most of the investigations into statistical models has been limited to small circuits such as a logic gate. However, the massive size of present day electronic systems precludes the use of design techniques which consider a system to comprise these basic gates, as this level of design is very inefficient and error prone. This thesis proposes a methodology to bring the effects of process variation from transistor level up to architectural level in terms of circuit delay and leakage power dissipation. Using a first order canonical model and statistical analysis approach, a statistical cell library has been built which comprises not only the basic gate cell models, but also more complex functional blocks such as registers, FIFOs, counters, ALUs etc. Furthermore, other sensitive factors to the overall system performance, such as input signal slope, output load capacitance, different signal switching cases and transition types are also taken into account for each cell in the library, which makes it adaptive to an incremental circuit design. The proposed methodology enables an efficient analysis of process variation effects on system performance with significantly reduced computation time compared to the Monte Carlo simulation approach. As a demonstration vehicle for this technique, the delay and leakage power distributions of a 2-stage asynchronous micropipeline circuit has been simulated using this cell library. The experimental results show that the proposed method can predict the delay and leakage power distribution with less than 5% error and at least 50,000 times faster computation time compare to 5000-sample SPICE based Monte Carlo simulation. The methodology presented here for modelling process variability plays a significant role in Design for Manufacturability (DFM) by quantifying the direct impact of process variations on system performance. The advantages of being able to undertake this analysis at a high level of abstraction and thus early in the design cycle are two fold. First, if the predicted effects of process variation render the circuit performance to be outwith specification, design modifications can be readily incorporated to rectify the situation. Second, knowing what the acceptable limits of process variation are to maintain design performance within its specification, informed choices can be made regarding the implementation technology and manufacturer selected to fabricate the design

    Algorithm and Hardware Co-design for Learning On-a-chip

    Get PDF
    abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    System level performance and yield optimisation for analogue integrated circuits

    No full text
    Advances in silicon technology over the last decade have led to increased integration of analogue and digital functional blocks onto the same single chip. In such a mixed signal environment, the analogue circuits must use the same process technology as their digital neighbours. With reducing transistor sizes, the impact of process variations on analogue design has become prominent and can lead to circuit performance falling below specification and hence reducing the yield.This thesis explores the methodology and algorithms for an analogue integrated circuit automation tool that optimizes performance and yield. The trade-offs between performance and yield are analysed using a combination of an evolutionary algorithm and Monte Carlo simulation. Through the integration of yield parameter into the optimisation process, the trade off between the performance functions can be better treated that able to produce a higher yield. The results obtained from the performance and variation exploration are modelled behaviourally using a Verilog-A language. The model has been verified with transistor level simulation and a silicon prototype.For a large analogue system, the circuit is commonly broken down into its constituent sub-blocks, a process known as hierarchical design. The use of hierarchical-based design and optimisation simplifies the design task and accelerates the design flow by encouraging design reuse.A new approach for system level yield optimisation using a hierarchical-based design is proposed and developed. The approach combines Multi-Objective Bottom Up (MUBU) modelling technique to model the circuit performance and variation and Top Down Constraint Design (TDCD) technique for the complete system level design. The proposed method has been used to design a 7th order low pass filter and a charge pump phase locked loop system. The results have been verified with transistor level simulations and suggest that an accurate system level performance and yield prediction can be achieved with the proposed methodology

    Memristor-Based Digital Systems Design and Architectures

    Get PDF
    Memristor is considered as a suitable alternative solution to resolve the scaling limitation of CMOS technology. In recent years, the use of memristors in circuits design has rapidly increased and attracted researcher’s interest. Advances have been made to both size and complexity of memristor designs. The development of CMOS transistors shows major concerns, such as, increased leakage power, reduced reliability, and high fabrication cost. These factors have affected chip manufacturing process and functionality severely. Therefore, the demand for new devices is increasing. Memristor, is considered as one of the key element in memory and information processing design due to its small size, long-term data storage, low power, and CMOS compatibility. The main objective in this research is to design memristor-based arithmetic circuits and to overcome some of the Memristor based logic design issues. In this thesis, a fast, low area and low power hybrid CMOS memristor based digital circuit design were implemented. Small and large-scale memristor based digital circuits are implemented and provided a solutions for overcoming the memristor degradation and fan-out challenges. As an example, a 4- bit LFSR has been implemented by using MRL scheme with 64 CMOS devices and 64 memristors. The proposed design is more efficient in terms of the area when compared with CMOS- based LFSR circuits. The simulation results proves the functionality of the design. This approach presents acceptable speed in comparison with CMOS-based design and it is faster than IMPLY-based memrisitive LFSR. The propped LFSR has 841 ps de-lay. Furthermore, the proposed design has a significant power reduction of over 66% less than CMOS-based approach. This thesis proposes implementation of memristive 2-D median filter and extends previously published works on memristive Filter design to include this emerging technology characteristics in image processing. The proposed circuit was designed based on Pt/TaOx/Ta redox-based device and Memristor Ratioed Logic (MRL). The proposed filter is designed in Cadence and the memristive median approved tested circuit is translated to Verilog-XL as a behavioral model. Different 512 _ 512 pixels input images contain salt and pepper noise with various noise density ratios are applied to the proposed median filter and the design successfully has substantially removed the noise. The implementation results in comparison with the conventional filters, it gives better Peak Signal to Noise Ratio (PSNR) and Mean Absolute Error (MAE) for different images with different noise density ratios while it saves more area as compared to CMOS-based design. This dissertation proposes a comprehensive framework for design, mapping and synthesis of large-scale memristor-CMOS circuits. This framework provides a synthesis approach that can be applied to all memristor-based digital logic designs. In particular, it is a proposal for a characterization methodology of memristor-based logic cells to generate a standard cell library for large scale simulation. The proposed framework is implemented in the Cadence Virtuoso schematic-level environment and was veri_ed with Verilog-XL, MATLAB, and the Electronic Design Automation (EDA) Synopses compiler after being translated to the behavioral level. The proposed method can be applied to implement any digital logic design. The frame work is deployed for design of the memristor-based parallel 8-bit adder/subtractor and a 2-D memristive-based median filter

    Degradation Models and Optimizations for CMOS Circuits

    Get PDF
    Die Gewährleistung der Zuverlässigkeit von CMOS-Schaltungen ist derzeit eines der größten Herausforderungen beim Chip- und Schaltungsentwurf. Mit dem Ende der Dennard-Skalierung erhöht jede neue Generation der Halbleitertechnologie die elektrischen Felder innerhalb der Transistoren. Dieses stärkere elektrische Feld stimuliert die Degradationsphänomene (Alterung der Transistoren, Selbsterhitzung, Rauschen, usw.), was zu einer immer stärkeren Degradation (Verschlechterung) der Transistoren führt. Daher erleiden die Transistoren in jeder neuen Technologiegeneration immer stärkere Verschlechterungen ihrer elektrischen Parameter. Um die Funktionalität und Zuverlässigkeit der Schaltung zu wahren, wird es daher unerlässlich, die Auswirkungen der geschwächten Transistoren auf die Schaltung präzise zu bestimmen. Die beiden wichtigsten Auswirkungen der Verschlechterungen sind ein verlangsamtes Schalten, sowie eine erhöhte Leistungsaufnahme der Schaltung. Bleiben diese Auswirkungen unberücksichtigt, kann die verlangsamte Schaltgeschwindigkeit zu Timing-Verletzungen führen (d.h. die Schaltung kann die Berechnung nicht rechtzeitig vor Beginn der nächsten Operation abschließen) und die Funktionalität der Schaltung beeinträchtigen (fehlerhafte Ausgabe, verfälschte Daten, usw.). Um diesen Verschlechterungen der Transistorparameter im Laufe der Zeit Rechnung zu tragen, werden Sicherheitstoleranzen eingeführt. So wird beispielsweise die Taktperiode der Schaltung künstlich verlängert, um ein langsameres Schaltverhalten zu tolerieren und somit Fehler zu vermeiden. Dies geht jedoch auf Kosten der Performanz, da eine längere Taktperiode eine niedrigere Taktfrequenz bedeutet. Die Ermittlung der richtigen Sicherheitstoleranz ist entscheidend. Wird die Sicherheitstoleranz zu klein bestimmt, führt dies in der Schaltung zu Fehlern, eine zu große Toleranz führt zu unnötigen Performanzseinbußen. Derzeit verlässt sich die Industrie bei der Zuverlässigkeitsbestimmung auf den schlimmstmöglichen Fall (maximal gealterter Schaltkreis, maximale Betriebstemperatur bei minimaler Spannung, ungünstigste Fertigung, etc.). Diese Annahme des schlimmsten Falls garantiert, dass der Chip (oder integrierte Schaltung) unter allen auftretenden Betriebsbedingungen funktionsfähig bleibt. Darüber hinaus ermöglicht die Betrachtung des schlimmsten Falles viele Vereinfachungen. Zum Beispiel muss die eigentliche Betriebstemperatur nicht bestimmt werden, sondern es kann einfach die schlimmstmögliche (sehr hohe) Betriebstemperatur angenommen werden. Leider lässt sich diese etablierte Praxis der Berücksichtigung des schlimmsten Falls (experimentell oder simulationsbasiert) nicht mehr aufrechterhalten. Diese Berücksichtigung bedingt solch harsche Betriebsbedingungen (maximale Temperatur, etc.) und Anforderungen (z.B. 25 Jahre Betrieb), dass die Transistoren unter den immer stärkeren elektrischen Felder enorme Verschlechterungen erleiden. Denn durch die Kombination an hoher Temperatur, Spannung und den steigenden elektrischen Feldern bei jeder Generation, nehmen die Degradationphänomene stetig zu. Das bedeutet, dass die unter dem schlimmsten Fall bestimmte Sicherheitstoleranz enorm pessimistisch ist und somit deutlich zu hoch ausfällt. Dieses Maß an Pessimismus führt zu erheblichen Performanzseinbußen, die unnötig und demnach vermeidbar sind. Während beispielsweise militärische Schaltungen 25 Jahre lang unter harschen Bedingungen arbeiten müssen, wird Unterhaltungselektronik bei niedrigeren Temperaturen betrieben und muss ihre Funktionalität nur für die Dauer der zweijährigen Garantie aufrechterhalten. Für letzteres können die Sicherheitstoleranzen also deutlich kleiner ausfallen, um die Performanz deutlich zu erhöhen, die zuvor im Namen der Zuverlässigkeit aufgegeben wurde. Diese Arbeit zielt darauf ab, maßgeschneiderte Sicherheitstoleranzen für die einzelnen Anwendungsszenarien einer Schaltung bereitzustellen. Für fordernde Umgebungen wie Weltraumanwendungen (wo eine Reparatur unmöglich ist) ist weiterhin der schlimmstmögliche Fall relevant. In den meisten Anwendungen, herrschen weniger harsche Betriebssbedingungen (z.B. sorgen Kühlsysteme für niedrigere Temperaturen). Hier können Sicherheitstoleranzen maßgeschneidert und anwendungsspezifisch bestimmt werden, sodass Verschlechterungen exakt toleriert werden können und somit die Zuverlässigkeit zu minimalen Kosten (Performanz, etc.) gewahrt wird. Leider sind die derzeitigen Standardentwurfswerkzeuge für diese anwendungsspezifische Bestimmung der Sicherheitstoleranz nicht gut gerüstet. Diese Arbeit zielt darauf ab, Standardentwurfswerkzeuge in die Lage zu versetzen, diesen Bedarf an Zuverlässigkeitsbestimmungen für beliebige Schaltungen unter beliebigen Betriebsbedingungen zu erfüllen. Zu diesem Zweck stellen wir unsere Forschungsbeiträge als vier Schritte auf dem Weg zu anwendungsspezifischen Sicherheitstoleranzen vor: Schritt 1 verbessert die Modellierung der Degradationsphänomene (Transistor-Alterung, -Selbsterhitzung, -Rauschen, etc.). Das Ziel von Schritt 1 ist es, ein umfassendes, einheitliches Modell für die Degradationsphänomene zu erstellen. Durch die Verwendung von materialwissenschaftlichen Defektmodellierungen werden die zugrundeliegenden physikalischen Prozesse der Degradationsphänomena modelliert, um ihre Wechselwirkungen zu berücksichtigen (z.B. Phänomen A kann Phänomen B beschleunigen) und ein einheitliches Modell für die simultane Modellierung verschiedener Phänomene zu erzeugen. Weiterhin werden die jüngst entdeckten Phänomene ebenfalls modelliert und berücksichtigt. In Summe, erlaubt dies eine genaue Degradationsmodellierung von Transistoren unter gleichzeitiger Berücksichtigung aller essenziellen Phänomene. Schritt 2 beschleunigt diese Degradationsmodelle von mehreren Minuten pro Transistor (Modelle der Physiker zielen auf Genauigkeit statt Performanz) auf wenige Millisekunden pro Transistor. Die Forschungsbeiträge dieser Dissertation beschleunigen die Modelle um ein Vielfaches, indem sie zuerst die Berechnungen so weit wie möglich vereinfachen (z.B. sind nur die Spitzenwerte der Degradation erforderlich und nicht alle Werte über einem zeitlichen Verlauf) und anschließend die Parallelität heutiger Computerhardware nutzen. Beide Ansätze erhöhen die Auswertungsgeschwindigkeit, ohne die Genauigkeit der Berechnung zu beeinflussen. In Schritt 3 werden diese beschleunigte Degradationsmodelle in die Standardwerkzeuge integriert. Die Standardwerkzeuge berücksichtigen derzeit nur die bestmöglichen, typischen und schlechtestmöglichen Standardzellen (digital) oder Transistoren (analog). Diese drei Typen von Zellen/Transistoren werden von der Foundry (Halbleiterhersteller) aufwendig experimentell bestimmt. Da nur diese drei Typen bestimmt werden, nehmen die Werkzeuge keine Zuverlässigkeitsbestimmung für eine spezifische Anwendung (Temperatur, Spannung, Aktivität) vor. Simulationen mit Degradationsmodellen ermöglichen eine Bestimmung für spezifische Anwendungen, jedoch muss diese Fähigkeit erst integriert werden. Diese Integration ist eines der Beiträge dieser Dissertation. Schritt 4 beschleunigt die Standardwerkzeuge. Digitale Schaltungsentwürfe, die nicht auf Standardzellen basieren, sowie komplexe analoge Schaltungen können derzeit nicht mit analogen Schaltungssimulatoren ausgewertet werden. Ihre Performanz reicht für solch umfangreiche Simulationen nicht aus. Diese Dissertation stellt Techniken vor, um diese Werkzeuge zu beschleunigen und somit diese umfangreichen Schaltungen simulieren zu können. Diese Forschungsbeiträge, die sich jeweils über mehrere Veröffentlichungen erstrecken, ermöglichen es Standardwerkzeugen, die Sicherheitstoleranz für kundenspezifische Anwendungsszenarien zu bestimmen. Für eine gegebene Schaltungslebensdauer, Temperatur, Spannung und Aktivität (Schaltverhalten durch Software-Applikationen) können die Auswirkungen der Transistordegradation ausgewertet werden und somit die erforderliche (weder unter- noch überschätzte) Sicherheitstoleranz bestimmt werden. Diese anwendungsspezifische Sicherheitstoleranz, garantiert die Zuverlässigkeit und Funktionalität der Schaltung für genau diese Anwendung bei minimalen Performanzeinbußen

    Multi-level simulation of nano-electronic digital circuits on GPUs

    Get PDF
    Simulation of circuits and faults is an essential part in design and test validation tasks of contemporary nano-electronic digital integrated CMOS circuits. Shrinking technology processes with smaller feature sizes and strict performance and reliability requirements demand not only detailed validation of the functional properties of a design, but also accurate validation of non-functional aspects including the timing behavior. However, due to the rising complexity of the circuit behavior and the steady growth of the designs with respect to the transistor count, timing-accurate simulation of current designs requires a lot of computational effort which can only be handled by proper abstraction and a high degree of parallelization. This work presents a simulation model for scalable and accurate timing simulation of digital circuits on data-parallel graphics processing unit (GPU) accelerators. By providing compact modeling and data-structures as well as through exploiting multiple dimensions of parallelism, the simulation model enables not only fast and timing-accurate simulation at logic level, but also massively-parallel simulation with switch level accuracy. The model facilitates extensions for fast and efficient fault simulation of small delay faults at logic level, as well as first-order parametric and parasitic faults at switch level. With the parallelization on GPUs, detailed and scalable simulation is enabled that is applicable even to multi-million gate designs. This way, comprehensive analyses of realistic timing-related faults in presence of process- and parameter variations are enabled for the first time. Additional simulation efficiency is achieved by merging the presented methods in a unified simulation model, that allows to combine the unique advantages of the different levels of abstraction in a mixed-abstraction multi-level simulation flow to reach even higher speedups. Experimental results show that the implemented parallel approach achieves unprecedented simulation throughput as well as high speedup compared to conventional timing simulators. The underlying model scales for multi-million gate designs and gives detailed insights into the timing behavior of digital CMOS circuits, thereby enabling large-scale applications to aid even highly complex design and test validation tasks

    CYBER 200 Applications Seminar

    Get PDF
    Applications suited for the CYBER 200 digital computer are discussed. Various areas of application including meteorology, algorithms, fluid dynamics, monte carlo methods, petroleum, electronic circuit simulation, biochemistry, lattice gauge theory, economics and ray tracing are discussed

    Design of Neuromemristive Systems for Visual Information Processing

    Get PDF
    Neuromemristive systems (NMSs) are brain-inspired, adaptive computer architectures based on emerging resistive memory technology (memristors). NMSs adopt a mixed-signal design approach with closely-coupled memory and processing, resulting in high area and energy efficiencies. Previous work suggests that NMSs could even supplant conventional architectures in niche application domains such as visual information processing. However, given the infancy of the field, there are still several obstacles impeding the transition of these systems from theory to practice. This dissertation advances the state of NMS research by addressing open design problems spanning circuit, architecture, and system levels. Novel synapse, neuron, and plasticity circuits are designed to reduce NMSs’ area and power consumption by using current-mode design techniques and exploiting device variability. Circuits are designed in a 45 nm CMOS process with memristor models based on multilevel (W/Ag-chalcogenide/W) and bistable (Ag/GeS2/W) device data. Higher-level behavioral, power, area, and variability models are ported into MATLAB to accelerate the overall simulation time. The circuits designed in this work are integrated into neural network architectures for visual information processing tasks, including feature detection, clustering, and classification. Networks in the NMSs are trained with novel stochastic learning algorithms that achieve 3.5 reduction in circuit area, reduced design complexity, and exhibit similar convergence properties compared to the least-mean-squares algorithm. This work also examines the effects of device-level variations on NMS performance, which has received limited attention in previous work. The impact of device variations is reduced with a partial on-chip training methodology that enables NMSs to be configured with relatively sophisticated algorithms (e.g. resilient backpropagation), while maximizing their area-accuracy tradeoff
    • …
    corecore