83 research outputs found

    Advanced Timing and Synchronization Methodologies for Digital VLSI Integrated Circuits

    Get PDF
    This dissertation addresses timing and synchronization methodologies that are critical to the design, analysis and optimization of high-performance, integrated digital VLSI systems. As process sizes shrink and design complexities increase, achieving timing closure for digital VLSI circuits becomes a significant bottleneck in the integrated circuit design flow. Circuit designers are motivated to investigate and employ alternative methods to satisfy the timing and physical design performance targets. Such novel methods for the timing and synchronization of complex circuitry are developed in this dissertation and analyzed for performance and applicability.Mainstream integrated circuit design flow is normally tuned for zero clock skew, edge-triggered circuit design. Non-zero clock skew or multi-phase clock synchronization is seldom used because the lack of design automation tools increases the length and cost of the design cycle. For similar reasons, level-sensitive registers have not become an industry standard despite their superior size, speed and power consumption characteristics compared to conventional edge-triggered flip-flops.In this dissertation, novel design and analysis techniques that fully automate the design and analysis of non-zero clock skew circuits are presented. Clock skew scheduling of both edge-triggered and level-sensitive circuits are investigated in order to exploit maximum circuit performances. The effects of multi-phase clocking on non-zero clock skew, level-sensitive circuits are investigated leading to advanced synchronization methodologies. Improvements in the scalability of the computational timing analysis process with clock skew scheduling are explored through partitioning and parallelization.The integration of the proposed design and analysis methods to the physical design flow of integrated circuits synchronized with a next-generation clocking technology-resonant rotary clocking technology-is also presented. Based on the design and analysis methods presented in this dissertation, a computer-aided design tool for the design of rotary clock synchronized integrated circuits is developed

    Variation and power issues in VLSI clock networks

    Get PDF
    Clock Distribution Network (CDN) is an important component of any synchronous logic circuit. The function of CDN is to deliver the clock signal to the clock sinks. Clock skew is defined as the difference in the arrival time of the clock signal at the clock sinks. Higher uncertainty in skew (due to PVT variations) degrades circuit performance by decreasing the maximum possible delay between any two sequential elements. Aggressive frequency scaling has also led to high power consumption especially in CDN. This dissertation addresses variation and power issues in the design of current and potential future CDN. The research detailed in this work presents algorithmic techniques for the following problems: (1) Variation tolerance in useful skew design, (2) Link insertion for buffered clock nets, (3) Methodology and algorithms for rotary clocking and (4) Clock mesh optimization for skew-power trade off. For clock trees this dissertation presents techniques to integrate the different aspects of clock tree synthesis (skew scheduling, abstract topology and layout embedding) into one framework- tolerance to variations. This research addresses the issues involved in inserting cross-links in a buffered clock tree and proposes design criteria to avoid the risk of short-circuit current. Rotary clocking is a promising new clocking scheme that consists of unterminated rings formed by differential transmission lines. Rotary clocking achieves reduction in power dissipation clock skew. This dissertation addresses the issues in adopting current CAD methodology to rotary clocks. Alternative methodology and corresponding algorithmic techniques are detailed. Clock mesh is a popular form of CDN used in high performance systems. The problem of simultaneous sizing and placement of mesh buffers in a clock mesh is addressed. The algorithms presented remove the edges from the clock mesh to trade off skew tolerance for low power. For clock trees as well as link insertion, our experiments indicate significant reduction in clock skew due to variations. For clock mesh, experimental results indicate 18.5% reduction in power with 1.3% delay penalty on a average. In summary, this dissertation details methodologies/algorithms that address two critical issues- variation and power dissipation in current and potential future CDN

    Low Power Resonant Rotary Global Clock Distribution Network Design

    Get PDF
    Along with the increasing complexity of the modern very large scale integrated (VLSI) circuit design, the power consumption of the clock distribution network in digital integrated circuits is continuously increasing. In terms of power and clock skew, the resonant clock distribution network has been studied as a promising alternative to the conventional clock distribution network. Resonant clock distribution network, which works based on adiabatic switching principles, provides a complete solution for on-chip clock generation and distribution for low-power and low-skew clock network designs for high-performance synchronous VLSI circuits.This dissertation work aims to develop the global clock distribution network for one kind of resonant clocking technologies: The resonant rotary clocking technology. The following critical aspects are addressed in this work: (1) A novel rotary oscillator array (ROA) topology is proposed to solve the signal rotation direction uniformity problem, in order to support the design of resonant rotary clocking based low-skew clock distribution network; (2) A synchronization scheme is proposed to solve the large scale rotary clocking generation circuit synchronization problem; (3) A low-skew rotary clock distribution network design methodology is proposed with frequency, power and skew optimizations; (4) A resonant rotary clocking based physical design flow is proposed, which can be integrated in the current mainstream IC design flow; (5) A dynamic rotary frequency divider is proposed for dynamic frequency scaling applications. Experimental and theoretical results show: (1) The efficiency of the proposed methodology in the construction of low-skew, low-power resonant rotary clock distribution network. (2) The effectiveness of the dynamic rotary frequency divider in extending the operating frequency range of the low-power resonant rotary based applications.Ph.D., Electrical Engineering -- Drexel University, 201

    Timing-driven physical design for VLSI circuits using resonant rotary clocking

    Get PDF
    Paper presented at the Midwest Symposium on Circuits and Systems, San Juan, Puerto Rico.Resonant clocking technologies are next-generation clocking technologies that provide low or controllable-skew, low-jitter and multi-gigahertz frequency clock signals with low power consumption. This paper describes a collection of circuit partitioning, placement and synchronization methodologies that enables the implementation of high speed, low power circuits synchronized with the resonant rotary clocking technology. Resonant rotary clocking technology inherently supports (and requires) non-zero clock skew operation, which permits further improved circuit performances. The proposed physical design flow entails integrated circuit partitioning and placement methodologies that permit the hierarchical application of non-zero clock skew system timing. This design flow is shown to be a computationally efficient implementation method

    High performance IC clock networks with grid and tree topologies

    Get PDF
    In this dissertation, an essential step in the integrated circuit (IC) physical design flow—the clock network design—is investigated. Clock network design entailsa series of computationally intensive, large-scale design and optimization tasks for the generation and distribution of the clock signal through different topologies. The lack or inefficacy of the automation for implementing high performance clock networks, especially for low-power, high speed and variation-aware implementations, is the main driver for this research. The synthesis and optimization methods for the two most commonly used clock topologies in IC design—the grid topology and the tree topology—are primarily investigated.The clock mesh network, which uses the grid topology, has very low skew variation at the cost of high power dissipation. Two novel clock mesh network designmethodologies are proposed in this dissertation in order to reduce the power dissipation. These are the first methods known in literature that combine clock meshsynthesis with incremental register placement and clock gating for power saving purposes. The application of the proposed automation methods on the emerging resonant rotary clocking technology, which also has the grid topology, is investigated in this dissertation as well.The clock tree topology has the advantage of lower power dissipation compared to other traditional clock topologies (e.g. clock mesh, clock spine, clock tree with cross links) at the cost of increased performance degradation due to on-chip variations. A novel clock tree buffer polarity assignment flow is proposed in this dissertation in order to reduce these effects of on-chip variations on the clock tree topology. The proposed polarity assignment flow is the first work that introduces post-silicon, dynamic reconfigurability for polarity assignment, enabling clock gating for low power operation of the variation-tolerant clock tree networks.Ph.D., Electrical Engineering -- Drexel University, 201

    Green on-chip inductors in three-dimensional integrated circuits

    Get PDF
    This thesis focuses on the technique for the improvement of quality factor and inductance of the TSV inductors and then on the utilization of TSV inductors in various on-chip applications such as DC-DC converter and resonant clocking. Through-silicon-vias (TSVs) are the enabling technique for three-dimensional integrated circuits (3D ICs). However, their large area significantly reduces the benefits that can be obtained by 3D ICs. On the other hand, a major limiting factor for the implementation of many on-chip circuits such as DC-DC converters and resonant clocking is the large area overhead induced by spiral inductors. Several works have been proposed in the literature to make inductors out of idle TSVs. In this thesis, the technique to improve the quality factor and inductance is proposed and then discusses about two applications utilizing TSV inductors i.e., inductive DC-DC converters and LC resonant clocking. The TSV inductor performs inferior to spiral inductors due to its increases losses. Hence to improve the performance of the TSV inductor, the losses should be reduced. Inductive DC-DC converters become prominent for on-chip voltage conversion because of their high efficiency compared with other types of converters (e.g. linear and capacitive converters). On the other hand, to reduce on-chip power, LC resonant clocking has become an attractive option due to its same amplitude and phases compared to other resonant clocking methods such as standing wave and rotary wave. A major challenge for both applications is associated with the required inductor area. In this thesis, the effectiveness of such TSV inductors in addressing both challenges are demonstrated --Abstract, page iv

    Ring-Based Resonant Standing Wave Oscillators for 3D Clocking Applications

    Get PDF
    Ring-based resonant standing wave oscillators have been shown to be a useful clocking tech-nique that can distribute and generate a high frequency, low skew, low power, and stable clock signal. By using through-silicon-vias, this type of standing wave oscillator can be used to gener-ate the clocking scheme for 3D integrated circuits. In this thesis, we propose the use of such 3D standing wave oscillators and show how independent 3D oscillators in different stacks can syn-chronize through the use of a redistribution layer stub. Inter-chip clock synchronization is then accomplished without the need for a PLL. In addition, we propose the first 3D ring-based resonant standing wave oscillator bootstrap and reset circuit to initialize and stop oscillation. Using a 3D ring-based resonant standing wave oscillator, we propose a ring-based data fabric for 3D stacked DRAM and compare the results with existing approaches such as High Bandwidth Memory (HBM) or Wide I/O memory. We show that our Memory Architecture using a Ring-based Scheme (MARS) can provide the increases in speed necessary to overcome current memory bottlenecks, and can scale effectively as future 3D stacks become larger. Our MARS can trade off power, throughput, and latency to match different application requirements. By using a narrow bus, and connecting it to all channels, the MARS8 can provide an alternative memory configuration with ∼ 6.9× lower power consumption than HBM, and ∼ 2.7× faster speeds than Wide I/O. Using multiple ring topologies in the same stack, the channel count can double from 8 to 16, and then to 32. This is possible since MARS uses about 4× fewer TSVs per channel than HBM or Wide I/O. This provides speeds up to ∼ 4.2× faster than traditional HBM. This scalable architecture allows higher throughput and faster system performance for next-generation DRAM. The MARS topology proposed in this thesis can be used in a variety of computing systems, from lightweight IoT to large-scale data centers

    Sincronização em sistemas integrados a alta velocidade

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaA distribui ção de um sinal relógio, com elevada precisão espacial (baixo skew) e temporal (baixo jitter ), em sistemas sí ncronos de alta velocidade tem-se revelado uma tarefa cada vez mais demorada e complexa devido ao escalonamento da tecnologia. Com a diminuição das dimensões dos dispositivos e a integração crescente de mais funcionalidades nos Circuitos Integrados (CIs), a precisão associada as transições do sinal de relógio tem sido cada vez mais afectada por varia ções de processo, tensão e temperatura. Esta tese aborda o problema da incerteza de rel ogio em CIs de alta velocidade, com o objetivo de determinar os limites do paradigma de desenho sí ncrono. Na prossecu ção deste objectivo principal, esta tese propõe quatro novos modelos de incerteza com âmbitos de aplicação diferentes. O primeiro modelo permite estimar a incerteza introduzida por um inversor est atico CMOS, com base em parâmetros simples e su cientemente gen éricos para que possa ser usado na previsão das limitações temporais de circuitos mais complexos, mesmo na fase inicial do projeto. O segundo modelo, permite estimar a incerteza em repetidores com liga ções RC e assim otimizar o dimensionamento da rede de distribui ção de relógio, com baixo esfor ço computacional. O terceiro modelo permite estimar a acumula ção de incerteza em cascatas de repetidores. Uma vez que este modelo tem em considera ção a correla ção entre fontes de ruí do, e especialmente util para promover t ecnicas de distribui ção de rel ogio e de alimentação que possam minimizar a acumulação de incerteza. O quarto modelo permite estimar a incerteza temporal em sistemas com m ultiplos dom ínios de sincronismo. Este modelo pode ser facilmente incorporado numa ferramenta autom atica para determinar a melhor topologia para uma determinada aplicação ou para avaliar a tolerância do sistema ao ru ído de alimentação. Finalmente, usando os modelos propostos, são discutidas as tendências da precisão de rel ogio. Conclui-se que os limites da precisão do rel ogio são, em ultima an alise, impostos por fontes de varia ção dinâmica que se preveem crescentes na actual l ogica de escalonamento dos dispositivos. Assim sendo, esta tese defende a procura de solu ções em outros ní veis de abstração, que não apenas o ní vel f sico, que possam contribuir para o aumento de desempenho dos CIs e que tenham um menor impacto nos pressupostos do paradigma de desenho sí ncrono.Distributing a the clock simultaneously everywhere (low skew) and periodically everywhere (low jitter) in high-performance Integrated Circuits (ICs) has become an increasingly di cult and time-consuming task, due to technology scaling. As transistor dimensions shrink and more functionality is packed into an IC, clock precision becomes increasingly a ected by Process, Voltage and Temperature (PVT) variations. This thesis addresses the problem of clock uncertainty in high-performance ICs, in order to determine the limits of the synchronous design paradigm. In pursuit of this main goal, this thesis proposes four new uncertainty models, with di erent underlying principles and scopes. The rst model targets uncertainty in static CMOS inverters. The main advantage of this model is that it depends only on parameters that can easily be obtained. Thus, it can provide information on upcoming constraints very early in the design stage. The second model addresses uncertainty in repeaters with RC interconnects, allowing the designer to optimise the repeater's size and spacing, for a given uncertainty budget, with low computational e ort. The third model, can be used to predict jitter accumulation in cascaded repeaters, like clock trees or delay lines. Because it takes into consideration correlations among variability sources, it can also be useful to promote oorplan-based power and clock distribution design in order to minimise jitter accumulation. A fourth model is proposed to analyse uncertainty in systems with multiple synchronous domains. It can be easily incorporated in an automatic tool to determine the best topology for a given application or to evaluate the system's tolerance to power-supply noise. Finally, using the proposed models, this thesis discusses clock precision trends. Results show that limits in clock precision are ultimately imposed by dynamic uncertainty, which is expected to continue increasing with technology scaling. Therefore, it advocates the search for solutions at other abstraction levels, and not only at the physical level, that may increase system performance with a smaller impact on the assumptions behind the synchronous design paradigm

    CROSS-LAYER DESIGN, OPTIMIZATION AND PROTOTYPING OF NoCs FOR THE NEXT GENERATION OF HOMOGENEOUS MANY-CORE SYSTEMS

    Get PDF
    This thesis provides a whole set of design methods to enable and manage the runtime heterogeneity of features-rich industry-ready Tile-Based Networkon- Chips at different abstraction layers (Architecture Design, Network Assembling, Testing of NoC, Runtime Operation). The key idea is to maintain the functionalities of the original layers, and to improve the performance of architectures by allowing, joint optimization and layer coordinations. In general purpose systems, we address the microarchitectural challenges by codesigning and co-optimizing feature-rich architectures. In application-specific NoCs, we emphasize the event notification, so that the platform is continuously under control. At the network assembly level, this thesis proposes a Hold Time Robustness technique, to tackle the hold time issue in synchronous NoCs. At the network architectural level, the choice of a suitable synchronization paradigm requires a boost of synthesis flow as well as the coexistence with the DVFS. On one hand this implies the coexistence of mesochronous synchronizers in the network with dual-clock FIFOs at network boundaries. On the other hand, dual-clock FIFOs may be placed across inter-switch links hence removing the need for mesochronous synchronizers. This thesis will study the implications of the above approaches both on the design flow and on the performance and power quality metrics of the network. Once the manycore system is composed together, the issue of testing it arises. This thesis takes on this challenge and engineers various testing infrastructures. At the upper abstraction layer, the thesis addresses the issue of managing the fully operational system and proposes a congestion management technique named HACS. Moreover, some of the ideas of this thesis will undergo an FPGA prototyping. Finally, we provide some features for emerging technology by characterizing the power consumption of Optical NoC Interfaces

    Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview

    Get PDF
    Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation
    corecore