23 research outputs found

    Placement driven retiming with a coupled edge timing model

    Get PDF
    Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the predicted performance improvement will still be valid after placement has been performed. This paper presents a new retiming algorithm using a highly accurate timing model taking into account the effect of retiming on capacitive loads of single wires as well as fanout systems. We propose the integration of retiming into a timing-driven standard cell placement environment based on simulated annealing. Retiming is used as an optimization technique throughout the whole placement process. The experimental results show the benefit of the proposed approach. In comparison with the conventional design flow based on standard FEAS our approach achieved an improvement in cycle time of up to 34% and 17% on the average

    Fast algorithms for retiming large digital circuits

    Get PDF
    The increasing complexity of VLSI systems and shrinking time to market requirements demand good optimization tools capable of handling large circuits. Retiming is a powerful transformation that preserves functionality, and can be used to optimize sequential circuits for a wide range of objective functions by judiciously relocating the memory elements. Leiserson and Saxe, who introduced the concept, presented algorithms for period optimization (minperiod retiming) and area optimization (minarea retiming). The ASTRA algorithm proposed an alternative view of retiming using the equivalence between retiming and clock skew optimization;The first part of this thesis defines the relationship between the Leiserson-Saxe and the ASTRA approaches and utilizes it for efficient minarea retiming of large circuits. The new algorithm, Minaret, uses the same linear program formulation as the Leiserson-Saxe approach. The underlying philosophy of the ASTRA approach is incorporated to reduce the number of variables and constraints in this linear program. This allows minarea retiming of circuits with over 56,000 gates in under fifteen minutes;The movement of flip-flops in control logic changes the state encoding of finite state machines, requiring the preservation of initial (reset) states. In the next part of this work the problem of minimizing the number of flip-flops in control logic subject to a specified clock period and with the guarantee of an equivalent initial state, is formulated as a mixed integer linear program. Bounds on the retiming variables are used to guarantee an equivalent initial state in the retimed circuit. These bounds lead to a simple method for calculating an equivalent initial state for the retimed circuit;The transparent nature of level sensitive latches enables level-clocked circuits to operate faster and require less area. However, this transparency makes the operation of level-clocked circuits very complex, and optimization of level-clocked circuits is a difficult task. This thesis also presents efficient algorithms for retiming large level-clocked circuits. The relationship between retiming and clock skew optimization for level-clocked circuits is defined and utilized to develop efficient retiming algorithms for period and area optimization. Using these algorithms a circuit with 56,000 gates could be retimed for minimum period in under twenty seconds and for minimum area in under 1.5 hours

    A Fast Sequential Learning Technique for Real Circuits with Application to Enhancing ATPG Performance

    Get PDF
    This paper presents an efficient and novel method for sequential learning of implications, invalid states, and tied gates. It can handle real industrial circuits, with multiple clock domains and partial set/reset. The application of this method to improve the efficiency of sequential ATPG is also demonstrated by achieving higher fault coverages and lower test generation times

    Rewired retiming for flip-flop reduction and low power without delay penalty.

    Get PDF
    Jiang, Mingqi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2009.Includes bibliographical references (leaves [49]-51).Abstract also in Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 2 --- Rewiring Background --- p.4Chapter 2.1 --- REWIRE --- p.6Chapter 2.2 --- GBAW --- p.7Chapter 3 --- Retiming --- p.9Chapter 3.1 --- Min-Clock Period Retiming --- p.9Chapter 3.2 --- Min-Area Retiming --- p.17Chapter 3.3 --- Retiming for Low Power --- p.18Chapter 3.4 --- Retiming with Interconnect Delay --- p.22Chapter 4 --- Rewired Retiming for Flip-flop Reduction --- p.26Chapter 4.1 --- Motivation and Problem Formulation --- p.26Chapter 4.2 --- Retiming Indication --- p.29Chapter 4.3 --- Target Wire Selection --- p.31Chapter 4.4 --- Incremental Placement Update --- p.33Chapter 4.5 --- Optimization Flow --- p.36Chapter 4.6 --- Experimental Results --- p.38Chapter 5 --- Power Analysis for Rewired Retiming --- p.41Chapter 5.1 --- Power Model --- p.41Chapter 5.2 --- Experimental Results --- p.44Chapter 6 --- Conclusion --- p.47Bibliography --- p.5

    Retiming with wire delay and post-retiming register placement.

    Get PDF
    Tong Ka Yau Dennis.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 77-81).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations --- p.1Chapter 1.2 --- Progress on the Problem --- p.2Chapter 1.3 --- Our Contributions --- p.3Chapter 1.4 --- Thesis Organization --- p.4Chapter 2 --- Background on Retiming --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.2 --- Preliminaries --- p.7Chapter 2.3 --- Retiming Problem --- p.9Chapter 3 --- Literature Review on Retiming --- p.10Chapter 3.1 --- Introduction --- p.10Chapter 3.2 --- The First Retiming Paper --- p.11Chapter 3.2.1 --- """Retiming Synchronous Circuitry""" --- p.11Chapter 3.3 --- Important Extensions of the Basic Retiming Algorithm --- p.14Chapter 3.3.1 --- """A Fresh Look at Retiming via Clock Skew Optimization""" --- p.14Chapter 3.3.2 --- """An Improved Algorithm for Minimum-Area Retiming""" --- p.16Chapter 3.3.3 --- """Efficient Implementation of Retiming""" --- p.17Chapter 3.4 --- Retiming in Physical Design Stages --- p.19Chapter 3.4.1 --- """Physical Planning with Retiming""" --- p.19Chapter 3.4.2 --- """Simultaneous Circuit Partitioning/Clustering with Re- timing for Performance Optimization" --- p.20Chapter 3.4.3 --- """Performance Driven Multi-level and Multiway Parti- tioning with Retiming" --- p.22Chapter 3.5 --- Retiming with More Sophisticated Timing Models --- p.23Chapter 3.5.1 --- """Retiming with Non-zero Clock Skew, Variable Register, and Interconnect Delay""" --- p.23Chapter 3.5.2 --- """Placement Driven Retiming with a Coupled Edge Tim- ing Model""" --- p.24Chapter 3.6 --- Post-Retiming Register Placement --- p.26Chapter 3.6.1 --- """Layout Driven Retiming Using the Coupled Edge Tim- ing Model""" --- p.26Chapter 3.6.2 --- """Integrating Logic Retiming and Register Placement""" --- p.27Chapter 4 --- Retiming with Gate and Wire Delay [2] --- p.29Chapter 4.1 --- Introduction --- p.29Chapter 4.2 --- Problem Formulation --- p.30Chapter 4.3 --- Optimal Approach [2] --- p.31Chapter 4.3.1 --- Original Mathematical Framework for Retiming --- p.31Chapter 4.3.2 --- A Modified Optimal Approach --- p.33Chapter 4.4 --- Near-Optimal Fast Approach [2] --- p.37Chapter 4.4.1 --- Considering Wire Delay Only --- p.38Chapter 4.4.2 --- Considering Both Gate and Wire Delay --- p.42Chapter 4.4.3 --- Computational Complexity --- p.43Chapter 4.4.4 --- Experimental Results --- p.44Chapter 4.5 --- Lin's Optimal Approach [23] --- p.47Chapter 4.5.1 --- Theoretical Results --- p.47Chapter 4.5.2 --- Algorithm Description --- p.51Chapter 4.5.3 --- Computational Complexity --- p.52Chapter 4.5.4 --- Experimental Results --- p.52Chapter 4.6 --- Summary --- p.54Chapter 5 --- Register Insertion in Placement [36] --- p.55Chapter 5.1 --- Introduction --- p.55Chapter 5.2 --- Problem Formulation --- p.57Chapter 5.3 --- Placement of Registers After Retiming --- p.60Chapter 5.3.1 --- Topology Finding --- p.60Chapter 5.3.2 --- Register Placement --- p.69Chapter 5.4 --- Experimental Results --- p.71Chapter 5.5 --- Summary --- p.74Chapter 6 --- Conclusion --- p.75Bibliography --- p.7

    Synchronous Digital Circuits as Functional Programs

    Get PDF
    Functional programming techniques have been used to describe synchronous digital circuits since the early 1980s and have proven successful at describing certain types of designs. Here we survey the systems and formal underpinnings that constitute this tradition. We situate these techniques with respect to other formal methods for hardware design and discuss the work yet to be done

    System data communication structures for active-control transport aircraft, volume 2

    Get PDF
    The application of communication structures to advanced transport aircraft are addressed. First, a set of avionic functional requirements is established, and a baseline set of avionics equipment is defined that will meet the requirements. Three alternative configurations for this equipment are then identified that represent the evolution toward more dispersed systems. Candidate communication structures are proposed for each system configuration, and these are compared using trade off analyses; these analyses emphasize reliability but also address complexity. Multiplex buses are recognized as the likely near term choice with mesh networks being desirable for advanced, highly dispersed systems

    Spiking CMOS-NVM mixed-signal neuromorphic ConvNet with circuit- and training-optimized temporal subsampling

    Get PDF
    We increasingly rely on deep learning algorithms to process colossal amount of unstructured visual data. Commonly, these deep learning algorithms are deployed as software models on digital hardware, predominantly in data centers. Intrinsic high energy consumption of Cloud-based deployment of deep neural networks (DNNs) inspired researchers to look for alternatives, resulting in a high interest in Spiking Neural Networks (SNNs) and dedicated mixed-signal neuromorphic hardware. As a result, there is an emerging challenge to transfer DNN architecture functionality to energy-efficient spiking non-volatile memory (NVM)-based hardware with minimal loss in the accuracy of visual data processing. Convolutional Neural Network (CNN) is the staple choice of DNN for visual data processing. However, the lack of analog-friendly spiking implementations and alternatives for some core CNN functions, such as MaxPool, hinders the conversion of CNNs into the spike domain, thus hampering neuromorphic hardware development. To address this gap, in this work, we propose MaxPool with temporal multiplexing for Spiking CNNs (SCNNs), which is amenable for implementation in mixed-signal circuits. In this work, we leverage the temporal dynamics of internal membrane potential of Integrate & Fire neurons to enable MaxPool decision-making in the spiking domain. The proposed MaxPool models are implemented and tested within the SCNN architecture using a modified version of the aihwkit framework, a PyTorch-based toolkit for modeling and simulating hardware-based neural networks. The proposed spiking MaxPool scheme can decide even before the complete spatiotemporal input is applied, thus selectively trading off latency with accuracy. It is observed that by allocating just 10% of the spatiotemporal input window for a pooling decision, the proposed spiking MaxPool achieves up to 61.74% accuracy with a 2-bit weight resolution in the CIFAR10 dataset classification task after training with back propagation, with only about 1% performance drop compared to 62.78% accuracy of the 100% spatiotemporal window case with the 2-bit weight resolution to reflect foundry-integrated ReRAM limitations. In addition, we propose the realization of one of the proposed spiking MaxPool techniques in an NVM crossbar array along with periphery circuits designed in a 130nm CMOS technology. The energy-efficiency estimation results show competitive performance compared to recent neuromorphic chip designs
    corecore