75,631 research outputs found

    OpenCAL++: An object-oriented architecture for transparent parallel execution of cellular automata models

    Get PDF
    Cellular Automata (CA) models, initially studied by John von Neumann, have been developed by numerous researchers and applied in both academic and scientific fields. Thanks to their local and independent rules, simulations of complex systems can be easily implemented based on CA modelling on parallel machines. However, due to the heterogeneity of the components - from the hardware to the software perspective-the various possible scenarios running parallelism in today’s architectures can pose a challenge in such implementations, making it difficult to exploit. This paper presents OpenCAL++, a transparent and efficient object-oriented platform for the parallel execution of cellular automata models. The architecture of OpenCAL++ ensures the modeller a fully transparent parallel execution and a strong ”separation of concerns” between the execution parallelism issues and the model implementation. The code implementing the Cellular Automata model remains the same whether the execution performs in a shared-, distributed-memory or a GPGPU context, irrespective of the optimizations adopted. To this aim, the object-oriented paradigm has been intensely exploited. As well as the OpenCAL++ architecture, we present the description of a simple Cellular Automata model implementation for illustrative purposes.This research was funded by the Italian “ICSC National Center for HPC, Big Data and Quantum Computing” Project, CN00000013 (approved under the Call M42C –Investment 1.4 – Avvisto “Centri Nazionali” – D.D. n. 3138 of 16.12.2021, admitted to financing with MUR Decree n. 1031 of 06.17.2022)Peer ReviewedPostprint (author's final draft

    Developing Efficient Discrete Simulations on Multicore and GPU Architectures

    Get PDF
    In this paper we show how to efficiently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU) TIN2017-89842

    Nature-Inspired Interconnects for Self-Assembled Large-Scale Network-on-Chip Designs

    Get PDF
    Future nano-scale electronics built up from an Avogadro number of components needs efficient, highly scalable, and robust means of communication in order to be competitive with traditional silicon approaches. In recent years, the Networks-on-Chip (NoC) paradigm emerged as a promising solution to interconnect challenges in silicon-based electronics. Current NoC architectures are either highly regular or fully customized, both of which represent implausible assumptions for emerging bottom-up self-assembled molecular electronics that are generally assumed to have a high degree of irregularity and imperfection. Here, we pragmatically and experimentally investigate important design trade-offs and properties of an irregular, abstract, yet physically plausible 3D small-world interconnect fabric that is inspired by modern network-on-chip paradigms. We vary the framework's key parameters, such as the connectivity, the number of switch nodes, the distribution of long- versus short-range connections, and measure the network's relevant communication characteristics. We further explore the robustness against link failures and the ability and efficiency to solve a simple toy problem, the synchronization task. The results confirm that (1) computation in irregular assemblies is a promising and disruptive computing paradigm for self-assembled nano-scale electronics and (2) that 3D small-world interconnect fabrics with a power-law decaying distribution of shortcut lengths are physically plausible and have major advantages over local 2D and 3D regular topologies
    corecore