Search CORE

25,526 research outputs found

Optimising Simulation Data Structures for the Xeon Phi

Author: Chimeh Mozhgan K.
Cockshott Paul
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines. We evaluate its performance on different test circuits simulated on the Intel Xeon Phi and 2 other machines. Comparisons are presented of this software/hardware combination with reported performances of GPU and other multi-core simulation platforms. Comparisons are also given between the lock free architecture and a leading commercial simulator running on the same Intel hardware

Enlighten

Parallelization of cycle-based logic simulation

Author: Mancini Toni
Massini Annalisa
Tronci Enrico
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2017
Field of study

Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the execution of the simulation, and the simulation phase, that consists in executing the parallel simulation of the considered circuit for a certain number of cycles. During the early phase of design, compilation phase has to be repeated each time a bug is found. Thus, if the time of the compilation phase is too high, the advantages stemming from the parallel approach may be lost. In this work we propose an effective version of the compilation phase and compute the corresponding execution time. We also analyze the percentage of execution time required by the different steps of the compilation phase for a set of literature benchmarks. Further, we implemented the simulation phase exploiting the GPU architecture, and we computed the execution times for a set of benchmarks obtaining values comparable with literature ones. Finally, we implemented the sequential version of the Cycle-based simulation in such a way that the execution time is optimized. We used the sequential values to compute the speedup of the parallel version for the considered set of benchmarks

Archivio della ricerca- Università di Roma La Sapienza

Optimisation and parallelism in synchronous digital circuit simulators

Author: Chimeh Mozhgan Kabiri
Hall C.
O'Donnell J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Digital circuit simulation often requires a large amount of computation, resulting in long run times. We consider several techniques for optimising a brute force synchronous circuit simulator: an algorithm using an event queue that avoids recalculating quiescent parts of the circuit, a marking algorithm that is similar to the event queue but that avoids a central data structure, and a lazy algorithm that avoids calculating signals whose values are not needed. Two target architectures for the simulator are used: a sequential CPU, and a parallel GPGPU. The interactions between the different optimisations are discussed, and the performance is measured while the algorithms are simulating a simple but realistic scalable circuit

Crossref

Enlighten

Empowering parallel computing with field programmable gate arrays

Author: D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

After more than 30 years, reconﬁgurable computing has grown from a concept to a mature ﬁeld of science and technology. The cornerstone of this evolution is the ﬁeld programmable gate array, a building block enabling the conﬁguration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reﬁnements

Ghent University Academic Bibliography

A review of High Performance Computing foundations for scientists

Author: Cramer C. J.
Dongarra J.
Dror R. O.
Goldberg D.
Hager G.
Haoqiang J.
Hennessy J. L.
Marx D.
Moore G.
PABLO E. IBÁÑEZ
PABLO GARCÍA-RISUEÑO
Wilkinson B.
Woo D. H.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 23/05/2012
Field of study

The increase of existing computational capabilities has made simulation emerge as a third discipline of Science, lying midway between experimental and purely theoretical branches [1, 2]. Simulation enables the evaluation of quantities which otherwise would not be accessible, helps to improve experiments and provides new insights on systems which are analysed [3-6]. Knowing the fundamentals of computation can be very useful for scientists, for it can help them to improve the performance of their theoretical models and simulations. This review includes some technical essentials that can be useful to this end, and it is devised as a complement for researchers whose education is focused on scientific issues and not on technological respects. In this document we attempt to discuss the fundamentals of High Performance Computing (HPC) [7] in a way which is easy to understand without much previous background. We sketch the way standard computers and supercomputers work, as well as discuss distributed computing and discuss essential aspects to take into account when running scientific calculations in computers.Comment: 33 page

arXiv.org e-Print Archive

Crossref

A case study for NoC based homogeneous MPSoC architectures

Author: Casu Mario Roberto
Macchiarulo Luca
Ruo Roch Massimo
Tota Sergio Vincenzo
Zamboni Maurizio
Publication venue: IEEE
Publication date: 01/01/2009
Field of study

The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A sparse octree gravitational N-body code that runs entirely on the GPU processor

Author: Barnes
Barnes
Barnes
Belleman
Billeter
Buck
Burtscher
de Berg
Dehnen
Dubinski
Evghenii Gaburov
Fukushige
Gaburov
Gaburov
Hamada
Hamada
Harfst
Hut
Jeroen Bédorf
Knuth
Lauterbach
Makino
Makino
McMillan
Nyland
Plummer
Portegies Zwart
Portegies Zwart
Raman
Salmon
Satish
Simon Portegies Zwart
Springel
Warren
Yokota
Publication venue: 'Elsevier BV'
Publication date: 01/04/2012
Field of study

We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications