42 research outputs found

    Design and modelling of variability tolerant on-chip communication structures for future high performance system on chip designs

    Get PDF
    The incessant technology scaling has enabled the integration of functionally complex System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single chip. The processing elements on these chips are integrated through on-chip communication structures which provide the infrastructure necessary for the exchange of data and control signals, while meeting the strenuous physical and design constraints. The use of vast amounts of on chip communications will be central to future designs where variability is an inherent characteristic. For this reason, in this thesis we investigate the performance and variability tolerance of typical on-chip communication structures. Understanding of the relationship between variability and communication is paramount for the designers; i.e. to devise new methods and techniques for designing performance and power efficient communication circuits in the forefront of challenges presented by deep sub-micron (DSM) technologies. The initial part of this work investigates the impact of device variability due to Random Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. The characterization data so obtained can be used to estimate the performance and failure probability of simple links through the methodology proposed in this work. For the Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate estimation of the probability density functions of different circuit parameters is proposed. Moreover, its significance on pipelined circuits is highlighted. Power and area are one of the most important design metrics for any integrated circuit (IC) design. This thesis emphasises the consideration of communication reliability while optimizing for power and area. A methodology has been proposed for the simultaneous optimization of performance, area, power and delay variability for a repeater inserted interconnect. Similarly for multi-bit parallel links, bandwidth driven optimizations have also been performed. Power and area efficient semi-serial links, less vulnerable to delay variations than the corresponding fully parallel links are introduced. Furthermore, due to technology scaling, the coupling noise between the link lines has become an important issue. With ever decreasing supply voltages, and the corresponding reduction in noise margins, severe challenges are introduced for performing timing verification in the presence of variability. For this reason an accurate model for crosstalk noise in an interconnection as a function of time and skew is introduced in this work. This model can be used for the identification of skew condition that gives maximum delay noise, and also for efficient design verification

    Modelling and Test Generation for Crosstalk Faults in DSM Chips

    Get PDF
    In the era of deep submicron technology (DSM), many System-on-Chip (SoC) applications require the components to be operating at high clock speeds. With the shrinking feature size and ever increasing clock frequencies, the DSM technology has led to a well-known problem of Signal Integrity (SI) more especially in the connecting layout design. The increasing aspect ratios of metal wires and also the ratio of coupling capacitance over substrate capacitance result in electrical coupling of interconnects which leads to crosstalk problems. In this thesis, first the work carried out to model the crosstalk behaviour between aggressor and victim by considering the distributed RLGC parameters of interconnect and the coupling capacitance and mutual conductance between the two nets is presented. The proposed model also considers the RC linear models of the CMOS drivers and receivers. The behaviour of crosstalk in case of under etching problem has been studied and modelled by distributing and approximating the defect behaviour throughout the nets. Next, the proposed model has also been extended to model the behaviour of crosstalk in case of one victim is influenced by several aggressors by considering all aggressors have similar effect (worst-case) on victim. In all the above cases simulation experiments were also carried out and compared with well-known circuit simulation tool PSPICE. It has been proved that the generated crosstalk model is faster and the results generated are within 10% of error margin compared to latter simulation tool. Because of the accuracy and speed of the proposed model, the model is very useful for both SoC designers and test engineers to analyse the crosstalk behaviour. Each manufactured device needs to be tested thoroughly to ensure the functionality before its delivery. The test pattern generation for crosstalk faults is also necessary to test the corresponding crosstalk faults. In this thesis, the well-known PODEM algorithm for stuck-at faults is extended to generate the test patterns for crosstalk faults between single aggressor and single victim. To apply modified PODEM for crosstalk faults, the transition behaviour has been divided into two logic parts as before transition and after transition. After finding individually required test patterns for before transition and after transition, the generated logic vectors are appended to create transition test patterns for crosstalk faults. The developed algorithm is also applied for a few ISCAS 85 benchmark circuits and the fault coverage is found excellent in most circuits. With the incorporation of proposed algorithm into the ATPG tools, the efficiency of testing will be improved by generating the test patterns for crosstalk faults besides for the conventional stuck-at faults. In generating test patterns for crosstalk faults on single victim due to multiple aggressors, the modified PODEM algorithm is found to be more time consuming. The search capability of Genetic Algorithms in finding the required combination of several input factors for any optimized problem fascinated to apply GA for generating test patterns as generating the test pattern is also similar to finding the required vector out of several input transitions. Initially the GA is applied for generating test patterns for stuck-at faults and compared the results with PODEM algorithm. As the fault coverage is almost similar to the deterministic algorithm PODEM, the GA developed for stuck-at faults is extended to find test patterns for crosstalk faults between single aggressor and single victim. The elitist GA is also applied for a few ISCAS 85 benchmark circuits. Later the algorithm is extended to generate test patterns for worst-case crosstalk faults. It has been proved that elitist GA developed in this thesis is also very useful in generating test patterns for crosstalk faults especially for multiple aggressor and single victim crosstalk faults

    SIGNAL PROCESSING TECHNIQUES AND APPLICATIONS

    Get PDF
    As the technologies scaling down, more transistors can be fabricated into the same area, which enables the integration of many components into the same substrate, referred to as system-on-chip (SoC). The components on SoC are connected by on-chip global interconnects. It has been shown in the recent International Technology Roadmap of Semiconductors (ITRS) that when scaling down, gate delay decreases, but global interconnect delay increases due to crosstalk. The interconnect delay has become a bottleneck of the overall system performance. Many techniques have been proposed to address crosstalk, such as shielding, buffer insertion, and crosstalk avoidance codes (CACs). The CAC is a promising technique due to its good crosstalk reduction, less power consumption and lower area. In this dissertation, I will present analytical delay models for on-chip interconnects with improved accuracy. This enables us to have a more accurate control of delays for transition patterns and lead to a more efficient CAC, whose worst-case delay is 30-40% smaller than the best of previously proposed CACs. As the clock frequency approaches multi-gigahertz, the parasitic inductance of on-chip interconnects has become significant and its detrimental effects, including increased delay, voltage overshoots and undershoots, and increased crosstalk noise, cannot be ignored. We introduce new CACs to address both capacitive and inductive couplings simultaneously.Quantum computers are more powerful in solving some NP problems than the classical computers. However, quantum computers suffer greatly from unwanted interactions with environment. Quantum error correction codes (QECCs) are needed to protect quantum information against noise and decoherence. Given their good error-correcting performance, it is desirable to adapt existing iterative decoding algorithms of LDPC codes to obtain LDPC-based QECCs. Several QECCs based on nonbinary LDPC codes have been proposed with a much better error-correcting performance than existing quantum codes over a qubit channel. In this dissertation, I will present stabilizer codes based on nonbinary QC-LDPC codes for qubit channels. The results will confirm the observation that QECCs based on nonbinary LDPC codes appear to achieve better performance than QECCs based on binary LDPC codes.As the technologies scaling down further to nanoscale, CMOS devices suffer greatly from the quantum mechanical effects. Some emerging nano devices, such as resonant tunneling diodes (RTDs), quantum cellular automata (QCA), and single electron transistors (SETs), have no such issues and are promising candidates to replace the traditional CMOS devices. Threshold gate, which can implement complex Boolean functions within a single gate, can be easily realized with these devices. Several applications dealing with real-valued signals have already been realized using nanotechnology based threshold gates. Unfortunately, the applications using finite fields, such as error correcting coding and cryptography, have not been realized using nanotechnology. The main obstacle is that they require a great number of exclusive-ORs (XORs), which cannot be realized in a single threshold gate. Besides, the fan-in of a threshold gate in RTD nanotechnology needs to be bounded for both reliability and performance purpose. In this dissertation, I will present a majority-class threshold architecture of XORs with bounded fan-in, and compare it with a Boolean-class architecture. I will show an application of the proposed XORs for the finite field multiplications. The analysis results will show that the majority class outperforms the Boolean class architectures in terms of hardware complexity and latency. I will also introduce a sort-and-search algorithm, which can be used for implementations of any symmetric functions. Since XOR is a special symmetric function, it can be implemented via the sort-and-search algorithm. To leverage the power of multi-input threshold functions, I generalize the previously proposed sort-and-search algorithm from a fan-in of two to arbitrary fan-ins, and propose an architecture of multi-input XORs with bounded fan-ins

    Data integrity for on-chip interconnects

    Get PDF
    With shrinking feature size and growing integration density in the Deep Sub- Micron (DSM) technologies, the global buses are fast becoming the "weakest-links" in VLSI design. They have large delays and are error-prone. Especially, in system-onchip (SoC) designs, where parallel interconnects run over large distances, they pose difficult research and design problems. This work presents an approach for evaluating the data carrying capacity of such wires. The method treats the delay and reliability in interconnects from an information theoretic perspective. The results point to an optimal frequency of operation for a given bus dimension for maximum data transfer rate. Moreover, this optimal frequency is higher than that achieved by present day designs which accommodate the worst case delays. This work also proposes several novel ways to approach this optimal data transfer rate in practical designs.From the analysis of signal propagation delay in long wires, it is seen that the signal delay distribution has a long tail, meaning that most signals arrive at the output much faster than the worst case delay. Using communication theory, these "good" signals arriving early can be used to predict/correct the "few" signals that arrive late. In addition to this correction based on prediction, the approaches use coding techniques to eliminate high delay cases to generate a higher transmission rate. The work also extends communication theoretic approaches to other areas of VLSI design. Parity groups are generated based on low output delay correlation to add redundancy in combinatorial circuits. This redundancy is used to increase the frequency of operation and/or reduce the energy consumption while improving the overall reliability of the circuit

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Design and Implementation of Area and Power Efficient Low Power VLSI Circuits through Simple Byte Compression with Encoding Technique

    Get PDF
    Transition activity is one of the major factors for power dissipation in Low power VLSI circuits due to charging and discharging of internal node capacitances. Power dissipation is reduced through minimizing the transition activity by using proper coding techniques. In this paper Multi coding technique is implemented to reduce the transition activity up to 58.26%. Speed of data transmission basically depends on the number of bits transmitted through bus. When handling data for large applications huge storage space is required for processing, storing and transferring information. Data compression is an algorithm to reduce the number of bits required to represent information in a compact form. Here simple byte compression technique is implemented to achieve a lossless data compression. This compression algorithm also reduces the encoder computational complexity when handling huge bits of information. Simple byte compression technique improves the compression ratio up to 62.5%. As a cumulative effort of Simple byte compression with Multi coding techniques minimize area and power dissipation in low power VLSI circuits

    Layout optimization in ultra deep submicron VLSI design

    Get PDF
    As fabrication technology keeps advancing, many deep submicron (DSM) effects have become increasingly evident and can no longer be ignored in Very Large Scale Integration (VLSI) design. In this dissertation, we study several deep submicron problems (eg. coupling capacitance, antenna effect and delay variation) and propose optimization techniques to mitigate these DSM effects in the place-and-route stage of VLSI physical design. The place-and-route stage of physical design can be further divided into several steps: (1) Placement, (2) Global routing, (3) Layer assignment, (4) Track assignment, and (5) Detailed routing. Among them, layer/track assignment assigns major trunks of wire segments to specific layers/tracks in order to guide the underlying detailed router. In this dissertation, we have proposed techniques to handle coupling capacitance at the layer/track assignment stage, antenna effect at the layer assignment, and delay variation at the ECO (Engineering Change Order) placement stage, respectively. More specifically, at layer assignment, we have proposed an improved probabilistic model to quickly estimate the amount of coupling capacitance for timing optimization. Antenna effects are also handled at layer assignment through a linear-time tree partitioning algorithm. At the track assignment stage, timing is further optimized using a graph based technique. In addition, we have proposed a novel gate splitting methodology to reduce delay variation in the ECO placement considering spatial correlations. Experimental results on benchmark circuits showed the effectiveness of our approaches

    A novel low-swing voltage driver design and the analysis of its robustness to the effects of process variation and external disturbances

    Get PDF
    arket forces are continually demanding devices with increased functionality/unit area; these demands have been satisfied through aggressive technology scaling which, unfortunately, has impacted adversely on the global interconnect delay subsequently reducing system performance. Line drivers have been used to mitigate the problems with delay; however, these have a large power consumption. A solution to reducing the power dissipation of the drivers is to use lower supply voltages. However, by adopting a lower power supply voltage, the performance of the line drivers for global interconnects is impaired unless low-swing signalling techniques are implemented. Low-swing signalling techniques can provide high speed signalling with low power consumption and hence can be used to drive global on-chip interconnect. Most of the proposed low-swing signalling schemes are immune to noise as they have a good SNR. However, they tend to have a large penalty in area and complexity as they require additional circuitry such as voltage generators and low-Vth devices. Most of the schemes also incorporate multiple Vdd and reference voltages which increase the overall circuit complexity. A diode-connected driver circuit has the best attributes over other low-swing signalling techniques in terms of low power, low delay, good SNR and low area overhead. By incorporating a diode-connected configuration at the output, it can provide high speed signalling due to its high driving capability. However, this configuration also has its limitations as it has issues with its adaptability to process variations, as well as an issue with leakage currents. To address these limitations, two novel driver schemes have been designed, namely, nLVSD and mLVSD, which, additionally, have improvements in performance and power consumption. Comparisons between the proposed schemes with the existing diode-connected driver circuits (MJ and DDC) showed that the nLVSD and mLVSD drivers have approximately 46% and 50% less delay. The name MJ originates from the driver’s designer called Juan A. Montiel-Nelson, while DDC stands for dynamic diode-connected. In terms of power consumption, the nLVSD and mLVSD drivers also produce 43% and 7% improvement. Additionally, the mLVSD driver scheme is the most robust as its SNR is 14 to 44% higher compared to other diode-connected driver circuits. On the other hand, the nLVSD driver has 6% lower SNR compared to the MJ driver, even though it is 19% more robust than the DDC driver. However, since its SNR is still above 1, its improved performance and reduced power consumption, as well other advantages it has over other diode-connected driver circuits can compensate for this limitation. Regarding the robustness to external disturbances, the proposedmdriver circuits are more robust to crosstalk effects as the nLVSD and mLVSD drivers are approximately 35% and 7% more robust than other diode-connected drivers. Furthermore, the mLVSD driver is 5%, 33% and 47% more tolerant to SEUs compared to the nLVSD, MJ and DDC driver circuits respectively, whilst the MJ and DDC drivers are 26% and 40% less tolerant to SEUs iii compared to the nLVSD circuit. A comparison between the four schemes was also undertaken in the presence of ±3σ process and voltage (PV) variations. The analysis indicated that both proposed driver schemes are more robust than other diode-connected driver schemes, namely, the MJ and DDC driver circuits. The MJ driver scheme deviates approximately 18% and 35% more in delay and power consumption compared to the proposed schemes. The DDC driver has approximately 20% and 57% more variations in delay and power consumption in comparison to the proposed schemes. In order to further improve the robustness of the proposed driver circuits against process variation and environmental disturbances, they were further analysed to identify which process variables had the most impact on circuit delay and power consumption, as well as identifying several design techniques to mitigate problems with environmental disturbances. The most significant process parameters to have impact on circuit delay and power consumption were identified to be Vdd, tox, Vth, s, w and t. The impact of SEUs on the circuit can be reduced by increasing the bias currents whilst design methods such as increasing the interconnect spacing can help improve the circuit robustness against crosstalk. Overall it is considered that the proposed nLVSD and mLVSD circuits advance the state of the art in driver design for on-chip signalling applications.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    The MANGO clockless network-on-chip: Concepts and implementation

    Get PDF

    Inter-module Interfacing techniques for SoCs with multiple clock domains to address challenges in modern deep sub-micron technologies

    Get PDF
    Miniaturization of integrated circuits (ICs) due to the improvement in lithographic techniques in modem deep sub-micron (DSM) technologies allows several complex processing elements to coexist in one IC, which are called System-on-Chip. As a first contribution, this thesis quantitatively analyzes the severity of timing constraints associated with Clock Distribution Network (CDN) in modem DSM technologies and shows that different processing elements may work in different dock domains to alleviate these constraints. Such systems are known as Globally Asynchronous Locally Synchronous (GALS) systems. It is imperative that different processing elements of a GALS system need to communicate with each other through some interfacing technique, and these interfaces can be asynchronous or synchronous. Conventionally, the asynchronous interfaces are described at the Register Transfer Logic (RTL) or system level. Such designs are susceptible to certain design constraints that cannot be addressed at higher abstraction levels; crosstalk glitch is one such constraint. This thesis initially identifies, using an analytical model, the possibility of asynchronous interface malfunction due to crosstalk glitch propagation. Next, we characterize crosstalk glitch propagation under normal operating conditions for two different classes of asynchronous protocols, namely bundled data protocol based and delay insensitive asynchronous designs. Subsequently, we propose a logic abstraction level modeling technique, which provides a framework to the designer to verify the asynchronous protocols against crosstalk glitches. The utility of this modeling technique is demonstrated experimentally on a Xilinx Virtex-II Pro FPGA. Furthermore, a novel methodology is proposed to quench such crosstalk glitch propagation through gating the asynchronous interface from sending the signal during potential glitch vulnerable instances. This methodology is termed as crosstalk glitch gating. This technique is successfully applied to obtain crosstalk glitch quenching in the representative interfaces. This thesis also addresses the dock skew challenges faced by high-performance synchronous interfacing methodologies in modem DSM technologies. The proposed methodology allows communicating modules to run at a frequency that is independent of the dock skew. Leveraging a novel clock-scheduling algorithm, our technique permits a faster module to communicate safely with a slower module without slowing down. Safe data communications for mesochronous schemes and for the cases when communicating modules have dock frequency ratios of integer or coprime numbers are theoretically explained and experimentally demonstrated. A clock-scheduling technique to dynamically accommodate phase variations is also proposed. These methods are implemented to the Xilinx Virtex II Pro technology. Experiments prove that the proposed interfacing scheme allows modules to communicate data safely, for mesochronous schemes, at 350 MHz, which is the limit of the technology used, under a dock skew of more than twice the time period (i.e. a dock skew of 12 ns
    corecore