328 research outputs found

    SCALABLE INTEGRATED CIRCUIT SIMULATION ALGORITHMS FOR ENERGY-EFFICIENT TERAFLOP HETEROGENEOUS PARALLEL COMPUTING PLATFORMS

    Get PDF
    Integrated circuit technology has gone through several decades of aggressive scaling.It is increasingly challenging to analyze growing design complexity. Post-layout SPICE simulation can be computationally prohibitive due to the huge amount of parasitic elements, which can easily boost the computation and memory cost. As the decrease in device size, the circuits become more vulnerable to process variations. Designers need to statistically simulate the probability that a circuit does not meet the performance metric, which requires millions times of simulations to capture rare failure events. Recent, multiprocessors with heterogeneous architecture have emerged as mainstream computing platforms. The heterogeneous computing platform can achieve highthroughput energy efficient computing. However, the application of such platform is not trivial and needs to reinvent existing algorithms to fully utilize the computing resources. This dissertation presents several new algorithms to address those aforementioned two significant and challenging issues on the heterogeneous platform. Harmonic Balance (HB) analysis is essential for efficient verification of large postlayout RF and microwave integrated circuits (ICs). However, existing methods either suffer from excessively long simulation time and prohibitively large memory consumption or exhibit poor stability. This dissertation introduces a novel transient-simulation guided graph sparsification technique, as well as an efficient runtime performance modeling approach tailored for heterogeneous manycore CPU-GPU computing system to build nearly-optimal subgraph preconditioners that can lead to minimum HB simulation runtime. Additionally, we propose a novel heterogeneous parallel sparse block matrix algorithm by taking advantages of the structure of HB Jacobian matrices as well as GPU’s streaming multiprocessors to achieve optimal workload balancing during the preconditioning phase of HB analysis. We also show how the proposed preconditioned iterative algorithm can efficiently adapt to heterogeneous computing systems with different CPU and GPU computing capabilities. Extensive experimental results show that our HB solver can achieve up to 20X speedups and 5X memory reduction when compared with the state-of-the-art direct solver highly optimized for twelve-core CPUs. In nowadays variation-aware IC designs, cell characterizations and SRAM memory yield analysis require many thousands or even millions of repeated SPICE simulations for relatively small nonlinear circuits. In this dissertation, for the first time, we present a massively parallel SPICE simulator on GPU, TinySPICE, for efficiently analyzing small nonlinear circuits. TinySPICE integrates a highly-optimized shared-memory based matrix solver and fast parametric three-dimensional (3D) LUTs based device evaluation method. A novel circuit clustering method is also proposed to improve the stability and efficiency of the matrix solver. Compared with CPU-based SPICE simulator, TinySPICE achieves up to 264X speedups for parametric SRAM yield analysis without loss of accuracy

    Efficient and Robust Simulation, Modeling and Characterization of IC Power Delivery Circuits

    Get PDF
    As the Moore’s Law continues to drive IC technology, power delivery has become one of the most difficult design challenges. Two of the major components in power delivery are DC-DC converters and power distribution networks, both of which are time-consuming to simulate and characterize using traditional approaches. In this dissertation, we propose a complete set of solutions to efficiently analyze DC-DC converters and power distribution networks by finding a perfect balance between efficiency and accuracy. To tackle the problem, we first present a novel envelope following method based on a numerically robust time-delayed phase condition to track the envelopes of circuit states under a varying switching frequency. By adopting three fast simulation techniques, our proposed method achieves higher speedup without comprising the accuracy of the results. The robustness and efficiency of the proposed method are demonstrated using several DCDC converter and oscillator circuits modeled using the industrial standard BSIM4 transistor models. A significant runtime speedup of up to 30X with respect to the conventional transient analysis is achieved for several DC-DC converters with strong nonlinear switching characteristics. We then take another approach, average modeling, to enhance the efficiency of analyzing DC-DC converters. We proposed a multi-harmonic model that not only predicts the DC response but also captures the harmonics of arbitrary degrees. The proposed full-order model retains the inductor current as a state variable and accurately captures the circuit dynamics even in the transient state. Furthermore, by continuously monitoring state variables, our model seamlessly transitions between continuous conduction mode and discontinuous conduction mode. The proposed model, when tested with a system decoupling technique, obtains up to 10X runtime speedups over transistor-level simulations with a maximum output voltage error that never exceeds 4%. Based on the multi-harmonic averaged model, we further developed the small-signal model that provides a complete characterization of both DC averages and higher-order harmonic responses. The proposed model captures important high-frequency overshoots and undershoots of the converter response, which are otherwise unaccounted for by the existing techniques. In two converter examples, the proposed model corrects the misleading results of the existing models by providing the truthful characterization of the overall converter AC response and offers important guidance for converter design and closed-loop control. To address the problem of time-consuming simulation of power distribution networks, we present a partition-based iterative method by integrating block-Jacobi method with support graph method. The former enjoys the ease of parallelization, however, lacks a direct control of the numerical properties of the produced partitions. In contrast, the latter operates on the maximum spanning tree of the circuit graph, which is optimized for fast numerical convergence, but is bottlenecked by its difficulty of parallelization. In our proposed method, the circuit partitioning is guided by the maximum spanning tree of the underlying circuit graph, offering essential guidance for achieving fast convergence. The resulting block-Jacobi-like preconditioner maximizes the numerical benefit inherited from support graph theory while lending itself to straightforward parallelization as a partitionbased method. The experimental results on IBM power grid suite and synthetic power grid benchmarks show that our proposed method speeds up the DC simulation by up to 11.5X over a state-of-the-art direct solver

    Efficient and Robust Simulation, Modeling and Characterization of IC Power Delivery Circuits

    Get PDF
    As the Moore’s Law continues to drive IC technology, power delivery has become one of the most difficult design challenges. Two of the major components in power delivery are DC-DC converters and power distribution networks, both of which are time-consuming to simulate and characterize using traditional approaches. In this dissertation, we propose a complete set of solutions to efficiently analyze DC-DC converters and power distribution networks by finding a perfect balance between efficiency and accuracy. To tackle the problem, we first present a novel envelope following method based on a numerically robust time-delayed phase condition to track the envelopes of circuit states under a varying switching frequency. By adopting three fast simulation techniques, our proposed method achieves higher speedup without comprising the accuracy of the results. The robustness and efficiency of the proposed method are demonstrated using several DCDC converter and oscillator circuits modeled using the industrial standard BSIM4 transistor models. A significant runtime speedup of up to 30X with respect to the conventional transient analysis is achieved for several DC-DC converters with strong nonlinear switching characteristics. We then take another approach, average modeling, to enhance the efficiency of analyzing DC-DC converters. We proposed a multi-harmonic model that not only predicts the DC response but also captures the harmonics of arbitrary degrees. The proposed full-order model retains the inductor current as a state variable and accurately captures the circuit dynamics even in the transient state. Furthermore, by continuously monitoring state variables, our model seamlessly transitions between continuous conduction mode and discontinuous conduction mode. The proposed model, when tested with a system decoupling technique, obtains up to 10X runtime speedups over transistor-level simulations with a maximum output voltage error that never exceeds 4%. Based on the multi-harmonic averaged model, we further developed the small-signal model that provides a complete characterization of both DC averages and higher-order harmonic responses. The proposed model captures important high-frequency overshoots and undershoots of the converter response, which are otherwise unaccounted for by the existing techniques. In two converter examples, the proposed model corrects the misleading results of the existing models by providing the truthful characterization of the overall converter AC response and offers important guidance for converter design and closed-loop control. To address the problem of time-consuming simulation of power distribution networks, we present a partition-based iterative method by integrating block-Jacobi method with support graph method. The former enjoys the ease of parallelization, however, lacks a direct control of the numerical properties of the produced partitions. In contrast, the latter operates on the maximum spanning tree of the circuit graph, which is optimized for fast numerical convergence, but is bottlenecked by its difficulty of parallelization. In our proposed method, the circuit partitioning is guided by the maximum spanning tree of the underlying circuit graph, offering essential guidance for achieving fast convergence. The resulting block-Jacobi-like preconditioner maximizes the numerical benefit inherited from support graph theory while lending itself to straightforward parallelization as a partitionbased method. The experimental results on IBM power grid suite and synthetic power grid benchmarks show that our proposed method speeds up the DC simulation by up to 11.5X over a state-of-the-art direct solver

    Parallel Algorithms for Time and Frequency Domain Circuit Simulation

    Get PDF
    As a most critical form of pre-silicon verification, transistor-level circuit simulation is an indispensable step before committing to an expensive manufacturing process. However, considering the nature of circuit simulation, it can be computationally expensive, especially for ever-larger transistor circuits with more complex device models. Therefore, it is becoming increasingly desirable to accelerate circuit simulation. On the other hand, the emergence of multi-core machines offers a promising solution to circuit simulation besides the known application of distributed-memory clustered computing platforms, which provides abundant hardware computing resources. This research addresses the limitations of traditional serial circuit simulations and proposes new techniques for both time-domain and frequency-domain parallel circuit simulations. For time-domain simulation, this dissertation presents a parallel transient simulation methodology. This new approach, called WavePipe, exploits coarse-grained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step, the latter performs predictive computing along the forward direction. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardizing convergence and accuracy. As a coarse-grained parallel approach, it requires low parallel programming effort, furthermore it creates new avenues to have a full utilization of increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions. This dissertation also exploits the recently developed explicit telescopic projective integration method for efficient parallel transient circuit simulation by addressing the stability limitation of explicit numerical integration. The new method allows the effective time step controlled by accuracy requirement instead of stability limitation. Therefore, it not only leads to noticeable efficiency improvement, but also lends itself to straightforward parallelization due to its explicit nature. For frequency-domain simulation, this dissertation presents a parallel harmonic balance approach, applicable to the steady-state and envelope-following analyses of both driven and autonomous circuits. The new approach is centered on a naturally-parallelizable preconditioning technique that speeds up the core computation in harmonic balance based analysis. The proposed method facilitates parallel computing via the use of domain knowledge and simplifies parallel programming compared with fine-grained strategies. As a result, favorable runtime speedups are achieved

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    A Preconditioned Waveform Relaxation Solver for Signal Integrity Analysis of High-Speed Channels

    Get PDF
    This work presents a fast transient solver for Signal Integrity analysis of high-speed channels. We consider general chip-to-chip coupled interconnect structures, including arbitrary discontinuities at chip, package and board level. An external characterization of the interconnect in terms of tabulated scattering frequency samples is first converted to a closed-form macromodel, whose transient effects on input signals can be computed very efficiently through recursive convolutions. When combined with suitable models for drivers and receivers, a large-scale but very sparse system of equations is obtained. The latter is solved by an iterative scheme based on the Generalized Minimal RESidual (GMRES) method, further enhanced by a preconditioner based on Waveform-Relaxation. Contrary to previous formulations, the proposed scheme is guaranteed to converge in few iterations. Numerical examples show that the proposed solver outperforms standard SPICE in terms of runtime, with no loss of accuracy

    A User Programmable Battery Charging System

    Get PDF
    Rechargeable batteries are found in almost every battery powered application. Be it portable, stationary or motive applications, these batteries go hand in hand with battery charging systems. With energy harvesting being targeted in this day and age, high energy density and longer lasting batteries with efficient charging systems are being developed by companies and original equipment manufacturers. Whatever the application may be, rechargeable batteries, which deliver power to a load or system, have to be replenished or recharged once their energy is depleted. Battery charging systems must perform this replenishment by using very fast and efficient methods to extend battery life and to increase periods between charges. In this regard, they have to be versatile, efficient and user programmable to increase their applications in numerous battery powered systems. This is to reduce the cost of using different battery chargers for different types of battery powered applications and also to provide the convenience of rare battery replacement and extend the periods between charges. This thesis proposes a user programmable charging system that can charge a Lithium ion battery from three different input sources, i.e. a wall outlet, a universal serial bus (USB) and an energy harvesting system. The proposed charging system consists of three main building blocks, i.e. a pulse charger, a step down DC to DC converter and a switching network system, to extend the number of applications it can be used for. The switching network system is to allow charging of a battery via an energy harvesting system, while the step down converter is used to provide an initial supply voltage to kick start the energy harvesting system. The pulse charger enables the battery to be charged from a wall outlet or a USB network. It can also be reconfigured to charge a Nickel Metal Hydride battery. The final design is implemented on an IBM 0.18µm process. Experimental results verify the concept of the proposed charging system. The pulse charger is able to be reconfigured as a trickle charger and a constant current charger to charge a Li-ion battery and a Nickel Metal Hydride battery, respectively. The step down converter has a maximum efficiency of 90% at an input voltage of 3V and the charging of the battery via an energy harvesting system is also verified
    • …
    corecore