1,466 research outputs found

    Delay-Bounded Routing for Shadow Registers

    No full text
    The on-chip timing behaviour of synchronous circuits can be quantified at run-time by adding shadow registers, which allow designers to sample the most critical paths of a circuit at a different point in time than the user register would normally. In order to sample these paths precisely, the path skew between the user and the shadow register must be tightly controlled and consistent across all paths that are shadowed. Unlike a custom IC, FPGAs contain prefabricated resources from which composing an arbitrary routing delay is not trivial. This paper presents a method for inserting shadow registers with a minimum skew bound, whilst also reducing the maximum skew. To preserve circuit timing, we apply this to FPGA circuits post place-and-route, using only the spare resources left behind. We find that our techniques can achieve an average STA reported delay bound of ±200ps on a Xilinx device despite incomplete timing information, and achieve <1ps accuracy against our own delay model

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms

    Get PDF
    Computer-aided design (CAD) for very large scale integration (VLSI) involve

    NoC Design Flow for TDMA and QoS Management in a GALS Context

    No full text
    International audienceThis paper proposes a new approach dealing with the tedious problem of NoC guaranteed traffics according to GALS constraints impelled by the upcoming large System-on-Chips with multiclock domains. Our solution has been designed to adjust a tradeoff between synchronous and clockless asynchronous techniques. By means of smart interfaces between synchronous sub-NoCs, Quality-of-Service (QoS) for guaranteed traffic is assured over the entire chip despite clock heterogeneity. This methodology can be easily integrated in the usual NoC design flow as an extension to traditional NoC synchronous design flows. We present real implementation obtained with our tool for a 4G telecom scheme

    Process-induced skew reduction in nominal zero-skew clock trees

    Full text link
    Abstract — This work develops an analytic framework for clock tree analysis considering process variations that is shown to correspond well with Monte Carlo results. The analysis frame-work is used in a new algorithm that constructs deterministic nominal zero-skew clock trees that have reduced sensitivity to process variation. The new algorithm uses a sampling approach to perform route embedding during a bottom-up merging phase, but does not select the best embedding until the top-down phase. This results in clock trees that exhibit a mean skew reduction of 32.4 % on average and a standard deviation reduction of 40.7 % as verified by Monte Carlo. The average increase in total clock tree capacitance is less than 0.02%. I

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Throughput-driven floorplanning with wire pipelining

    Get PDF
    The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires

    Modeling of thermally induced skew variations in clock distribution network

    Get PDF
    Clock distribution network is sensitive to large thermal gradients on the die as the performance of both clock buffers and interconnects are affected by temperature. A robust clock network design relies on the accurate analysis of clock skew subject to temperature variations. In this work, we address the problem of thermally induced clock skew modeling in nanometer CMOS technologies. The complex thermal behavior of both buffers and interconnects are taken into account. In addition, our characterization of the temperature effect on buffers and interconnects provides valuable insight to designers about the potential impact of thermal variations on clock networks. The use of industrial standard data format in the interface allows our tool to be easily integrated into existing design flow

    Boosting the Performance of PC-based Software Routers with FPGA-enhanced Network Interface Cards

    Get PDF
    The research community is devoting increasing attention to software routers based on off-the-shelf hardware and open-source operating systems running on the personalcomputer (PC) architecture. Today's high-end PCs are equipped with peripheral component interconnect (PCI) shared buses enabling them to easily fit into the multi-gigabit-per-second routing segment, for a price much lower than that of commercial routers. However, commercially-available PC network interface cards (NICs) lack programmability, and require not only packets to cross the PCI bus twice, but also to be processed in software by the operating system, strongly reducing the achievable forwarding rate. It is therefore interesting to explore the performance of customizable NICs based on field-programmable gate array (FPGA) logic devices we developed and assess how well they can overcome the limitations of today's commercially-available NIC
    corecore