Search CORE

121,169 research outputs found

Parallelization of cycle-based logic simulation

Author: Mancini Toni
Massini Annalisa
Tronci Enrico
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2017
Field of study

Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the execution of the simulation, and the simulation phase, that consists in executing the parallel simulation of the considered circuit for a certain number of cycles. During the early phase of design, compilation phase has to be repeated each time a bug is found. Thus, if the time of the compilation phase is too high, the advantages stemming from the parallel approach may be lost. In this work we propose an effective version of the compilation phase and compute the corresponding execution time. We also analyze the percentage of execution time required by the different steps of the compilation phase for a set of literature benchmarks. Further, we implemented the simulation phase exploiting the GPU architecture, and we computed the execution times for a set of benchmarks obtaining values comparable with literature ones. Finally, we implemented the sequential version of the Cycle-based simulation in such a way that the execution time is optimized. We used the sequential values to compute the speedup of the parallel version for the considered set of benchmarks

Archivio della ricerca- Università di Roma La Sapienza

Weakly nonlinear circuit analysis based on fast multidimensional inverse Laplace transform

Author: Liu H
Wang T
Wang Y
Wong N
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

There have been continuing thrusts in developing efficient modeling techniques for circuit simulation. However, most circuit simulation methods are time-domain solvers. In this paper we propose a frequency-domain simulation method based on Laguerre function expansion. The proposed method handles both linear and nonlinear circuits. The Laguerre method can invert multidimensional Laplace transform efficiently with a high accuracy, which is a key step of the proposed method. Besides, an adaptive mesh refinement (AMR) technique is developed and its parallel implementation is introduced to speed up the computation. Numerical examples show that our proposed method can accurately simulate large circuits while enjoying low computation complexity. © 2012 IEEE.published_or_final_versio

HKU Scholars Hub

Parallel and Distributed Multi-Algorithm Circuit Simulation

Author: Dai Ruicheng
Publication venue
Publication date
Field of study

With the proliferation of parallel computing, parallel computer-aided design (CAD) has received significant research interests. Transient transistor-level circuit simulation plays an important role in digital/analog circuit design and verification. Increased VLSI design complexity has made circuit simulation an ever growing bottleneck, making parallel processing an appealing solution for addressing this challenge. In this thesis, we propose and develop a parallel and distributed multi-algorithm approach to leverage the power of multi-core computer clusters for speeding up transistor-level circuit simulation. The targeted multi-algorithm approach provides a natural paradigm for exploiting parallelism for circuit simulation. Parallel circuit simulation is facilitated through the exploration of algorithm diversity where multiple simulation algorithms collaboratively work on a single simulation task. To utilize computer clusters comprising of multi-core processors, each algorithm is executed on a separate node with sufficient system resource such as processing power, memory and I/O bandwidth. We propose two communication schemes, namely master-slave and peer-to-peer schemes, to allow for inter-algorithm communication. Compared with the shared-memory based multi-algorithm implementation, the proposed simulation approach alleviates cache/memory contention as a result of multi-algorithm execution and provides further runtime speedups

Texas A&M Repository

Relaxed Half-Stochastic Belief Propagation

Author: Gross Warren J.
Hemati Saied
Leduc-Primeau François
Mannor Shie
Publication venue
Publication date: 11/05/2012
Field of study

Low-density parity-check codes are attractive for high throughput applications because of their low decoding complexity per bit, but also because all the codeword bits can be decoded in parallel. However, achieving this in a circuit implementation is complicated by the number of wires required to exchange messages between processing nodes. Decoding algorithms that exchange binary messages are interesting for fully-parallel implementations because they can reduce the number and the length of the wires, and increase logic density. This paper introduces the Relaxed Half-Stochastic (RHS) decoding algorithm, a binary message belief propagation (BP) algorithm that achieves a coding gain comparable to the best known BP algorithms that use real-valued messages. We derive the RHS algorithm by starting from the well-known Sum-Product algorithm, and then derive a low-complexity version suitable for circuit implementation. We present extensive simulation results on two standardized codes having different rates and constructions, including low bit error rate results. These simulations show that RHS can be an advantageous replacement for the existing state-of-the-art decoding algorithms when targeting fully-parallel implementations

arXiv.org e-Print Archive

PolyPublie

Circuit simulation via matrix exponential method for stiffness handling and parallel processing

Author: Chen Q
Cheng CK
Weng SH
Wong N
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2012
Field of study

We propose an advanced matrix exponential method (MEXP) to handle the transient simulation of stiff circuits and enable parallel simulation. We analyze the rapid decaying of fast transition elements in Krylov subspace approximation of matrix exponential and leverage such scaling effect to leap larger steps in the later stage of time marching. Moreover, matrix-vector multiplication and restarting scheme in our method provide better scalability and parallelizability than implicit methods. The performance of ordinary MEXP can be improved up to 4.8 times for stiff cases, and the parallel implementation leads to another 11 times speedup. Our approach is demonstrated to be a viable tool for ultra-large circuit simulations (with 1.6M ∼ 12M nodes) that are not feasible with existing implicit methods. © 2012 ACM.published_or_final_versio

HKU Scholars Hub

Circuit simulation using distributed waveform relaxation techniques

Author: Jalnapurkar Anant Dattatraya
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

Simulation plays an important role in the design of integrated circuits. Due to high costs and large delays involved in their fabrication, simulation is commonly used to verify functionality and to predict performance before fabrication. This thesis describes analysis, implementation and performance evaluation of a distributed memory parallel waveform relaxation technique for the electrical circuit simulation of MOS VLSI circuits. The waveform relaxation technique exhibits inherent parallelism due to the partitioning of a circuit into a number of sub-circuits. These subcircuits can be concurrently simulated on parallel processors. Different forms of parallelism in the direct method and the waveform relaxation technique are studied. An analysis of single queue and distributed queue approaches to implement parallel waveform relaxation on distributed memory machines is performed and their performance implications are studied. The distributed queue approach selected for exploiting the coarse grain parallelism across sub-circuits is described. Parallel waveform relaxation programs based on Gauss-Seidel and Gauss-Jacobi techniques are implemented using a network of eight Transputers. Static and dynamic load balancing strategies are studied. A dynamic load balancing algorithm is developed and implemented. Results of parallel implementation are analyzed to identify sources of bottlenecks. This thesis has demonstrated the applicability of a low cost distributed memory multi-computer system for simulation of MOS VLSI circuits. Speed-up measurements prove that a five times improvement in the speed of calculations can be achieved using a full window parallel Gauss-Jacobi waveform relaxation algorithm. Analysis of overheads shows that load imbalance is the major source of overhead and that the fraction of the computation which must be performed sequentially is very low. Communication overhead depends on the nature of the parallel architecture and the design of communication mechanisms. The run-time environment (parallel processing framework) developed in this research exploits features of the Transputer architecture to reduce the effect of the communication overhead by effectively overlapping computation with communications, and running communications processes at a higher priority. This research will contribute to the development of low cost, high performance workstations for computer-aided design and analysis of VLSI circuits

eCommons@USASK

University of Saskatchewan Research Archive

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

Author: DeHon André
Kapre Nachiket
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models

Crossref

Caltech Authors

DR-NTU (Digital Repository of NTU)

Ravel-XL: a hardware accelerator for assigned-delay compiled-code logic gate simulation

Author: Brown R. B.
Marques-Silva J. P.
Riepe M. A.
Sakallah K. A.
Publication venue
Publication date: 01/03/1996
Field of study

Southampton (e-Prints Soton)