Search CORE

2,497 research outputs found

Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

Author: Daxiniray Debaprasad
Publication venue
Publication date: 01/01/2016
Field of study

Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

ethesis@nitr

Power Reductions with Energy Recovery Using Resonant Topologies

Author: Bezzam Ignatius S.A.
Publication venue: Scholar Commons
Publication date: 05/05/2015
Field of study

The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

Scholar Commons - Santa Clara University

Timing Measurement Platform for Arbitrary Black-Box Circuits Based on Transition Probability

Author: Cheung PYK
Wong JSJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

Crossref

Spiral - Imperial College Digital Repository

Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

Author: Alex Kondratyev
Christos Sotiriou
Jordi Cortadella
Lavagno Luciano
Publication venue
Publication date: 01/01/2006
Field of study

Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

High-performance and Low-power Clock Network Synthesis in the Presence of Variation.

Author: Lee Dong Jin
Publication venue
Publication date: 01/01/2011
Field of study

Semiconductor technology scaling requires continuous evolution of all aspects of physical design of integrated circuits. Among the major design steps, clock-network synthesis has been greatly affected by technology scaling, rendering existing methodologies inadequate. Clock routing was previously sufficient for smaller ICs, but design difficulty and structural complexity have greatly increased as interconnect delay and clock frequency increased in the 1990s. Since a clock network directly influences IC performance and often consumes a substantial portion of total power, both academia and industry developed synthesis methodologies to achieve low skew, low power and robustness from PVT variations. Nevertheless, clock network synthesis under tight constraints is currently the least automated step in physical design and requires significant manual intervention, undermining turn-around-time. The need for multi-objective optimization over a large parameter space and the increasing impact of process variation make clock network synthesis particularly challenging. Our work identifies new objectives, constraints and concerns in the clock-network synthesis for systems-on-chips and microprocessors. To address them, we generate novel clock-network structures and propose changes in traditional physical-design flows. We develop new modeling techniques and algorithms for clock power optimization subject to tight skew constraints in the presence of process variations. In particular, we offer SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below 5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we propose new techniques and a methodology to reduce dynamic power consumption by 6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis within global placement. We also present a novel non-tree topology that is 2.3x more power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy in a clock network to bridge the gap between tree-like and mesh-like topologies. Integrated optimization techniques for high-quality clock networks described in this dissertation strong empirical results in experiments with recent industry-released benchmarks in the presence of process variation. Our software implementations were recognized with the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Elastic bundles :modelling and architecting asynchronous circuits with granular rigidity

Author: Fernandes Johnson
Publication venue: Newcastle University
Publication date: 01/01/2017
Field of study

PhD ThesisIntegrated Circuit (IC) designs these days are predominantly System-on-Chips (SoCs). The complexity of designing a SoC has increased rapidly over the years due to growing process and environmental variations coupled with global clock distribution di culty. Moreover, traditional synchronous design is not apt to handle the heterogeneous timing nature of modern SoCs. As a countermeasure, the semiconductor industry witnessed a strong revival of asynchronous design principles. A new paradigm of digital circuits emerged, as a result, namely mixed synchronous-asynchronous circuits. With a wave of recent innovations in synchronous-asynchronous CAD integration, this paradigm is showing signs of commercial adoption in future SoCs mainly due to the scope for reuse of synchronous functional blocks and IP cores, and the co-existence of synchronous and asynchronous design styles in a common EDA framework. However, there is a lack of formal methods and tools to facilitate mixed synchronousasynchronous design. In this thesis, we propose a formal model based on Petri nets with step semantics to describe these circuits behaviourally. Implication of this model in the veri cation and synthesis of mixed synchronous-asynchronous circuits is studied. Till date, this paradigm has been mainly explored on the basis of Globally Asynchronous Locally Synchronous (GALS) systems. Despite decades of research, GALS design has failed to gain traction commercially. To understand its drawbacks, a simulation framework characterising the physical and functional aspects of GALS SoCs is presented. A novel method for synthesising mixed synchronous-asynchronous circuits with varying levels of rigidity is proposed. Starting with a high-level data ow model of a system which is intrinsically asynchronous, the key idea is to introduce rigidity of chosen granularity levels in the model without changing functional behaviour. The system is then partitioned into functional blocks of synchronous and asynchronous elements before being transformed into an equivalent circuit which can be synthesised using standard EDA tools

Newcastle University eTheses

Course grained low power design flow using UPF

Author: Varanasi Archana
Publication venue: RIT Scholar Works
Publication date: 01/08/2009
Field of study

Increased system complexity has led to the substitution of the traditional bottom-up design flow by systematic hierarchical design flow. The main motivation behind the evolution of such an approach is the increasing difficulty in hardware realization of complex systems. With decreasing channel lengths, few key problems such as timing closure, design sign-off, routing complexity, signal integrity, and power dissipation arise in the design flows. Specifically, minimizing power dissipation is critical in several high-end processors. In high-end processors, the design complexity contributes to the overall dynamic power while the decreasing transistor size results in static power dissipation. This research aims at optimizing the design flow for power and timing using the unified power format (UPF). UPF provides a strategic format to specify power-aware design information at every stage in the flow. The low power reduction techniques enforced in this research are multi-voltage, multi-threshold voltage (Vth), and power gating with state retention. An inherent design challenge addressed in this research is the choice of power optimization techniques as the flow advances from synthesis to physical design. A top-down digital design flow for a 32 bit MIPS RISC processor has been implemented with and without UPF synthesis flow for 65nm technology. The UPF synthesis is implemented with two voltages, 1.08V and 0.864V (Multi-VDD). Area, power and timing metrics are analyzed for the flows developed. Power savings of about 20 % are achieved in the design flow with \u27multi-threshold\u27 power technique compared to that of the design flow with no low power techniques employed. Similarly, 30 % power savings are achieved in the design flow with the UPF implemented when compared to that of the design flow with \u27multi-threshold\u27 power technique employed. Thus, a cumulative power savings of 42% has been achieved in a complete power efficient design flow (UPF) compared to that of the generic top-down standard flow with no power saving techniques employed. This is substantiated by the low voltage operation of modules in the design, reduction in clock switching power by gating clocks in the design and extensive use of HVT and LVT standard cells for implementation. The UPF synthesis flow saw the worst timing slack and more area when compared to those of the `multi-threshold\u27 or the generic flow. Percentage increase in the area with UPF is approximately 15%; a significant source for this increase being the additional power controlling logic added

RIT Scholar Works

저전력 고성능 디지털 시스템을 위한 고신뢰도의 클럭 네트워크 설계 방법론

Author: Hyungjung Seo
Publication venue: 서울대학교 대학원
Publication date: 01/08/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 김태환.오늘날의 회로 설계에서 공정변이가 회로 클럭의 타이밍의 변이에 미치는 영향은 매우 커짐에 따라, 전통적으로 사용되던 클럭 트리 구조를 기반으로 한 클럭 네트워크를 사용하는 것은 한계에 부딪히게 되었고, 이를 극복하기 위한 여러가지 기술들이 제안되었다. 본 논문에서는 변이에 강한 클럭 네트워크를 설계하기 위해, 연구 및 사용되고 있는 세 가지 기술에 대해 소개하고, 이들을 개선한 연구들을 제안한다. 첫째로, 이 논문에서는 클럭의 타이밍 문제를 회로 제작 이후 단계에서 조정할 수 있는 포스트 실리콘 조정 클럭 버퍼를 배치하는 문제에 대해 서술한다. 포스트 실리콘 조정 버퍼는 클럭의 지연시간을 회로가 제작된 이후의 단계에서 조정하 여 클럭의 타이밍 문제를 해결할 수 있지만, 버퍼 자체의 크기 때문에 최소한의 개수만 가장 효율적인 위치에 배치해야 하는 문제가 있다. 본 논문에서는 이전의 연구가 회로의 수율을 계산할 때 시간이 많이 걸리는 몬테-카를로 시뮬레이션을 사용하기 때문에 탐색 가능한 포스트 실리콘 조정 버퍼의 배치가 제한되는 문제가 있음을 지적한 후, 기존에 제안되었던 그래프 기반 회로 수율 계산 기법을 사용하여 효율적인 포스트 실리콘 조정 버퍼 배치를 찾을 수 있는 점진적이고 체계적인 방법을 제시한다. 다음은 클럭 시차 스케쥴링 방법에 대한 연구를 서술한다. 최근의 연구에서 제안되었던, 플립-플롭의 클럭에서 출력까지의 딜레이가 클럭의 준비시간과 유지시간에 의존한다는 유연한 플립-플롭 타이밍 모델 연구는 기존의 플립-플롭의 타이밍 특성들이 고정된 값이라는 가정에 기반한 정적 타이밍 분석의 정확성 문제를 해결할 수 있는 중요한 연구이다. 본 논문에서는 새로운 모델을 고려하여, 이전에 고전적인 플립-플롭 타이밍 특성 모델을 기반으로 진행되었던 클럭 시차 스케쥴링의 최적화 문제를 유연한 플립-플롭 타이밍 모델을 고려하여 해결하였다. 본 연구에서는 주어진 회로의 준비시간과 유지시간의 여유시간을 반복적이고 체계적으로 최대화하여 문제를 해결하였다. 마지막으로 클럭 스파인 네트워크의 합성을 자동화하는 문제에 대해 서술한다. 전통적인 클럭 트리 구조가 공정변이 문제를 해결하지 못했기 때문에 클럭 메쉬를 포함하는 다양한 대안적 구조가 제안되었다. 클럭 메쉬의 경우 공정변이에 의한 클럭 시차를 줄일 수 있었지만 이를 위해 와이어나 버퍼 등의 자원을 많이 소모하는 문제를 가지고 있다. 두 구조의 중간적 구조에는 클럭 트리의 노드를 연결하는 크로스 링크를 삽입하는 구조와 클럭 스파인 구조가 있다. 클럭 트리에 점진적인 수정을 가하여 만드는 크로스 링크와 달리, 클럭 스파인 구조는 트리나 이후에 제안된 메쉬와는 완전히 별개의 구조로, 이를 합성하는 방법도 매우 다르다. 그렇기 때문에 클럭 스파인을 합성하는 알고리즘은 필수적이라고 할 수 있으나, 합성 방법론이나 이를 자동화하는 방법에 관한 연구는 아직 없다. 본 논문에서는 우선, 클럭-게이팅을 지원하는 클럭 스파인을 주어진 클럭 시차 및 클럭 슬루 조건을 만족하면서 자원 및 전력 소모량을 최소화하는 문제에 대해 서술한다. 그리고, 회로에서 주어진 플립-플롭들을 클럭-게이팅 조건에서의 연관성을 고려하고 조직화하여 클럭 스파인을 삽입한 후, 클럭 시차 및 슬루 조건을 고려하여 버퍼를 삽입하는 알고리즘을 제안한다. 요약하면, 본 논문에서는 클럭의 타이밍 문제를 해결하기 위해 포스트-실리콘 조정 클럭 버퍼를 사용하는 테크닉과 클럭 시차 스케쥴링을 유연한 플립-플롭 타이밍 모델에서 적용하는 테크닉을 제시하고, 클럭의 타이밍 문제와 전력 소모 문제를 한번에 해결하기 위한 새로운 클럭 스파인 네트워크를 합성하는 자동화 알고리즘을 제시한다.As the process variation is dominating to cause the clock timing variation among chips to be much large, conventional clock tree based clock network is not able to guarantee the timing constraint of a digital system. To overcome the limitations of traditional clock design techniques, various techniques have been studied. This dissertation addresses three techniques that have been widely used for designing robust clock network and proposes developed methods. First, it is widely accepted that post-silicon tunable (PST) clock buffers can effectively resolve the clock timing violation. Since PST buffers, which can reset the clock delay to flip-flops after the chip is manufactured, impose a non-trivial implementation area and control circuitry, it is very important to minimally allocate PST buffers while satisfying the chip yield constraint. In this dissertation, we (1) develop a graph-based chip yield computation technique which can update yields very efficiently and accurately for incremental PST buffer allocation, based on which we (2) propose a systematic (bottom-up and top-down with refinement) PST buffer allocation algorithm that is able to fully explore the design space of PST buffer allocation. Second, clock skew scheduling is one of the essential steps that must be carefully performed during the design process. This dissertation addresses the clock skew optimization problem integrated with the consideration of the interdependent relation between the setup and hold skews, and clk-to-Q delay of flip-flops, so that the time margin is more accurately and reliably set aside over that of the previous methods, which have never taken the integrated problem into account. Precisely, based on an accurate flexible model of setup skew, hold skew, and clk-to-Q delay, we propose a stepwise clock skew scheduling technique in which at each iteration, the worst slack of setup and hold skews is systematically and incrementally relaxed to maximally extend the time margin. Lastly, clock tree with cross links and clock spine have an intermediate characteristics for skew tolerance and power consumption, compared to clock tree and clock mesh which are two extreme structures of clock network. Unlike the clock tree with links between clock nodes, which is a sort of an incremental modification of the structure of clock tree, clock spine network is a completely separated structure from the structures of tree and mesh. Consequently, it is necessary and essential to develop a synthesis algorithm for clock spines, which will be compatible to the existing synthesis algorithms of clock trees and clock meshes. To this end, this dissertation first addresses the problem of automating the synthesis of clock-gated clock spines with the objective of minimizing total clock power while meeting the clock skew and slew constraints. The key idea of our proposed synthesis algorithm is to identify and group the flip-flops with tight correlation of clock-gating operations together to form a spine while accurately predicting and maintaining clock skew and slew variations through the buffer insertion and stub allocation. In summary, this dissertation presents clock tuning techniques with consideration of post-silicon tuning, flexible flip-flop timing model, and clock-gated clock spine synthesis algorithm.Abstract i Chapter 1 INTRODUCTION 1 1.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . 1 1.2 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Flexible Flip-flop Timing Model . . . . . . . . . . . . . . . . . . . 3 1.4 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Contributions of This Dissertation . . . . . . . . . . . . . . . . . 6 Chapter 2 POST-SILICON TUNABLE CLOCK BUFFER ALLOCATION BASED ON FAST CHIP YIELD COMPUTATION 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Systematic Exploration of PST Buffer Allocation . . . . . . . . . 10 2.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Allocation Algorithm . . . . . . . . . . . . . . . . . . . . . 16 2.3 Fast Timing Yield Computation . . . . . . . . . . . . . . . . . . 17 2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Incremental Yield Computation . . . . . . . . . . . . . . . 22 2.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 PST Buffer Configuration Techniques . . . . . . . . . . . . . . . 31 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 3 POST-SILICON TUNING BASED ON FLEXIBLE FLIP-FLOP TIMING 34 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Preliminary and Definitions . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Flexible Flip-Flop Timing Model . . . . . . . . . . . . . . 40 3.2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Motivational Examples . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Clock Skew Scheduling for Slack Relaxation Based on Flexible Flip-Flop Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.2 Finding Local Clock Skew Schedule . . . . . . . . . . . . 48 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 4 SYNTHESIS FOR POWER-AWARE CLOCK SPINES 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Preliminaries and Motivation . . . . . . . . . . . . . . . . . . . . 64 4.2.1 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.2 Activity Patterns . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.3 Power Computation . . . . . . . . . . . . . . . . . . . . . 67 4.3 Algorithm for Clock Spine Synthesis . . . . . . . . . . . . . . . . 68 4.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 68 4.3.2 Power-Aware Sink Clustering . . . . . . . . . . . . . . . . 70 4.3.3 Spine Relaxation . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.4 Spine Buffer Allocation . . . . . . . . . . . . . . . . . . . 80 4.3.5 Top-Level Tree Construction . . . . . . . . . . . . . . . . 86 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 5 CONCLUSION 95 5.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Bibliography 97 초록 106Docto

SNU Open Repository and Archive

Physical design of USB1.1

Author: Koti Boddu Siva
Publication venue
Publication date: 01/01/2016
Field of study

In earlier days, interfacing peripheral devices to host computer has a big problematic. There existed so many different kinds’ ports like serial port, parallel port, PS/2 etc. And their use restricts many situations, Such as no hot-pluggability and involuntary configuration. There are very less number of methods to connect the peripheral devices to host computer. The main reason that Universal Serial Bus was implemented to provide an additional benefits compared to earlier interfacing ports. USB is designed to allow many peripheral be connecting using single standardize interface. It provides an expandable fast, cost effective, hot-pluggable plug and play serial hardware interface that makes life of computer user easier allowing them to plug different devices to into USB port and have them configured automatically. In this thesis demonstrated the USB v1.1 architecture part in briefly and generated gate level net list form RTL code by applying the different constraints like timing, area and power. By applying the various types design constraints so that the performance was improved by 30%. And then it implemented in physically by using SoC encounter EDI system, estimation of chip size, power analysis and routing the clock signal to all flip-flops presented in the design. To reduce the clock switching power implemented register clustering algorithm (DBSCAN). In this design implementation TSMC 180nm technology library is used

ethesis@nitr