4 research outputs found

    Parallelization of cycle-based logic simulation

    Get PDF
    Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the execution of the simulation, and the simulation phase, that consists in executing the parallel simulation of the considered circuit for a certain number of cycles. During the early phase of design, compilation phase has to be repeated each time a bug is found. Thus, if the time of the compilation phase is too high, the advantages stemming from the parallel approach may be lost. In this work we propose an effective version of the compilation phase and compute the corresponding execution time. We also analyze the percentage of execution time required by the different steps of the compilation phase for a set of literature benchmarks. Further, we implemented the simulation phase exploiting the GPU architecture, and we computed the execution times for a set of benchmarks obtaining values comparable with literature ones. Finally, we implemented the sequential version of the Cycle-based simulation in such a way that the execution time is optimized. We used the sequential values to compute the speedup of the parallel version for the considered set of benchmarks

    Parallel Discrete Event Simulation on Many Core Platforms Using Parallel Heap Event Queues

    Get PDF
    Discrete Event Simulation on GPUs employing parallel heap data structure is the focus of this thesis. Two traditional algorithms, one being conservative and other being optimistic, for parallel discrete event simulation have been implemented on GPUs using CUDA. The first algorithm is the safe-window algorithm (conservative). It has produced expected performance when compared to sequential simulation. The second algorithm, known as SyncSim, is an optimistic simulation algorithm previously designed to be space efficient and reduce rollbacks. This algorithm is re-implemented on GPU platform with necessary changes on the logic simulator and the parallel heap implementation. The performance of the parallel heap when working with a logic simulator has also been validated against the results indicated in previous research paper on parallel heap without the logic simulator

    Distributed time, conservative parallel logic simulation on GPUs

    No full text
    Logical simulation is the primary method to verify the correctness of IC designs. However, today’s complex VLSI designs pose ever higher demand for the throughput of logic simulators. In this work, a parallel logic simulator was developed by leveraging the com-puting power of modern graphics processing units (GPUs). To expose more parallelism, we implemented a conservative parallel simulation approach, the CMB algorithm, on NVidia GPUs. The simulation processing is mapped to GPU hardware at the finest granularity. With carefully designed data structures and data flow organizations, our GPU based simulator could overcome many problems that hindered efficient implementations of the CMB algorithm on traditional parallel computers. In order to efficiently use the relatively limited capacity of GPU memory, a novel mem-ory management mechanism was proposed to dynamically allo-cate and recycle GPU memory during simulation. We also intro-duced a CPU/GPU co-processing strategy for the best usage of computing resources. Experimental results showed that our GPU based simulator could outperform a CPU baseline event driven simulator by a factor of 29.2

    Accelerating Mixed-Abstraction SystemC Models on Multi-Core CPUs and GPUs

    Get PDF
    Functional verification is a critical part in the hardware design process cycle, and it contributes for nearly two-thirds of the overall development time. With increasing complexity of hardware designs and shrinking time-to-market constraints, the time and resources spent on functional verification has increased considerably. To mitigate the increasing cost of functional verification, research and academia have been engaged in proposing techniques for improving the simulation of hardware designs, which is a key technique used in the functional verification process. However, the proposed techniques for accelerating the simulation of hardware designs do not leverage the performance benefits offered by multiprocessors/multi-core and heterogeneous processors available today. With the growing ubiquity of powerful heterogeneous computing systems, which integrate multi-processor/multi-core systems with heterogeneous processors such as GPUs, it is important to utilize these computing systems to address the functional verification bottleneck. In this thesis, I propose a technique for accelerating SystemC simulations across multi-core CPUs and GPUs. In particular, I focus on accelerating simulation of SystemC models that are described at both the Register-Transfer Level (RTL) and Transaction Level (TL) abstractions. The main contributions of this thesis are: 1.) a methodology for accelerating the simulation of mixed abstraction SystemC models defined at the RTL and TL abstractions on multi-core CPUs and GPUs and 2.) An open-source static framework for parsing, analyzing, and performing source-to-source translation of identified portions of a SystemC model for execution on multi-core CPUs and GPUs
    corecore