1,853 research outputs found

    Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms

    Get PDF
    Computer-aided design (CAD) for very large scale integration (VLSI) involve

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Energy-Efficient Digital Circuit Design using Threshold Logic Gates

    Get PDF
    abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical. The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation. Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR. Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths. Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Synthesis Methodologies for Robust and Reconfigurable Clock Networks

    Get PDF
    In today\u27s aggressively scaled technology nodes, billions of transistors are packaged into a single integrated circuit. Electronic Design Automation (EDA) tools are needed to automatically assemble the transistors into a functioning system. One of the most important design steps in the physical synthesis is the design of the clock network. The clock network delivers a synchronizing clock signal to each sequential element. The clock signal is required to be delivered meeting timing constraints under variations and in multiple operating modes. Synthesizing such clock networks is becoming increasingly difficult with the complex power management methodologies and severe manufacturing variations. Clock network synthesis is an important problem because it has a direct impact on the functional correctness, the maximum operating frequency, and the overall power consumption of each synchronous integrated circuit. In this dissertation, we proposed synthesis methodologies for robust and reconfigurable clock networks. We have made three contributions to this topic. First, we have proposed a clock network optimization framework that can achieve better timing quality than previous frameworks. Our proposed framework improves timing quality by reducing the propagation delay on critical paths in a clock network using buffer sizing and layer assignment. Second, we have proposed a clock tree synthesis methodology that integrates the clock tree synthesis with the clock tree optimization. The methodology improves timing quality by avoiding to synthesize clock trees with topologies that are sensitive to variations. Third, we have proposed a clock network that can reconfigure the topology based on the active mode of operation. Lastly, we conclude the dissertation with future research directions

    Clock Polarity Assignment Methodologies for Designing High-Performance and Robust Clock Trees

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 김태환.In modern synchronous circuits, the system relies on one single signal, namely, the clock signal. All data sampling of flip-flops rely on the timing of the clock signal. This makes clock trees, which deliver the clock signal to every clock sink in the whole system, one of the most active components on a chip, as it must switch without halting. Naturally, this makes clock trees a primary target of optimization for low power/high performance designs. First, bounded skew clock polarity assignment is explored. Buffers in the clock tree switch simultaneously as the clock signal switch, which causes power/ground supply voltage fluctuation. This phenomenon is referred to as clock noise and brings adverse effects on circuit robustness. Clock polarity assignment technique replaces some of the buffers in the clock trees with inverters. Since buffers draw larger current at the rising edge of the clock while inverters draw larger current at the falling edge, this technique can mitigate peak noise problem at the power/ground supply rails. Second, useful skew clock polarity assignment method is developed. Useful clock skew methodology allows consideration of individual clock skew restraints between each clock sinks, allowing further noise reduction by exploiting more time slack. Through experiments with ISPD 2010 clock network synthesis contest benchmark circuits, the results show that the proposed clock polarity algorithm is able to reduce the peak noise caused by clock buffers by 10.9% further over that of the global skew bound constrained polarity assignment while satisfying all setup and hold time constraints. Lastly, as multi-corner multi-mode (MCMM) design methodologies, process variations and clock gating techniques are becoming common place in advanced technology nodes, clock polarity assignment methods that mitigate these problems are devised. Experimental results indicate that the proposed methods successfully satisfy required design constraints imposed by such variations. In summary, this dissertation presents clock polarity assignments that considers useful clock skew, delay variations, MCMM design methodologies and clock gating techniques.Chapter 1 Introduction 1 1.1 Clock Trees 1 1.2 Simultaneous Switching Noise 3 1.3 Clock Polarity Assignment Technique 4 1.4 Contributions of this Dissertation 5 Chapter 2 Clock Polarity Assignment Under Bounded Skew 7 2.1 Introduction 7 2.2 Motivational Example 9 2.3 Problem Formulation 13 2.4 Proposed Algorithm 17 2.4.1 Independence Assumption 17 2.4.2 Characterization of Noise 18 2.4.3 Overview of the Proposed Algorithm 19 2.4.4 Mapping WaveMin Problem to MOSP problem 22 2.4.5 A Fast Algorithm 26 2.4.6 Zone Sizing/Partitioning Method 27 2.5 Experimental Results 28 2.5.1 Experimental Setup 28 2.5.2 Noise Reduction 28 2.5.3 Simulation on Full Circuit 29 2.6 Effects of Clock Polarity Assignment on Simultaneous Switching Noise 34 2.6.1 Model of Power Delivery Network 34 2.6.2 Peak-to-Peak Voltage Swing 35 2.7 Effects of Decoupling Capacitors 36 2.8 Effects of Clock Polarity Assignment on Clock Jitter 40 2.8.1 Noise in Frequency Domain 40 2.9 Summary 43 Chapter 3 Clock Polarity Assignment Under Useful Skew 44 3.1 Introduction 44 3.2 Motivational Example 45 3.3 Problem Formulation 47 3.4 Proposed Algorithm 49 3.4.1 Integer Linear Programming Formulation and Linear Programming Relaxation 49 3.4.2 Formulating into Maximum Clique Problem 49 3.4.3 Scalable Algorithm for Clique Exploration 51 3.5 Experimental Results 54 3.5.1 Experimental Setup 54 3.5.2 Assessing the Performance of UsefulMin over Wavemin 56 3.6 Summary 57 Chapter 4 Extensions of Clock Polarity Assignment Methods 60 4.1 Coping With Thermal Variations 60 4.1.1 Introduction 60 4.1.2 Proposed Method 61 4.1.3 Experimental Results 66 4.2 Coping with Delay Variations 70 4.2.1 Introduction 70 4.2.2 The Impact of Process Variations on Polarity Assignment 71 4.2.3 Proposed Method for Variation Resiliency 72 4.2.4 Experimental Results 73 4.3 Coping With Multi-Mode Designs 75 4.3.1 Introduction 75 4.3.2 Proposed Method 76 4.3.3 Experimental Results 84 4.4 Orthogonality with Other Design Techniques ? Clock Gating 87 4.4.1 Introduction 87 4.4.2 Proposed Partitioning Method 87 4.4.3 Experimental Results 88 4.5 Summary 90 Chapter 5 Conclusion 92 5.1 Clock Polarity Assignment Under Bounded Skew 92 5.2 Clock Polarity Assignment Under Useful Skew 93 5.3 Extensions of Clock Polarity Assignment 93 Appendices 94 Chapter A Power Spectral Densities of ISCAS89 Circuits 95 Chapter B The Effect of Decoupling Capacitors 99 초록 109Docto

    Efficient in-situ delay monitoring for chip health tracking

    Get PDF
    corecore