26,859 research outputs found

    Future scaling of processor-memory interfaces

    Full text link

    Roughening of the (1+1) interfaces in two-component surface growth with an admixture of random deposition

    Full text link
    We simulate competitive two-component growth on a one dimensional substrate of LL sites. One component is a Poisson-type deposition that generates Kardar-Parisi-Zhang (KPZ) correlations. The other is random deposition (RD). We derive the universal scaling function of the interface width for this model and show that the RD admixture acts as a dilatation mechanism to the fundamental time and height scales, but leaves the KPZ correlations intact. This observation is generalized to other growth models. It is shown that the flat-substrate initial condition is responsible for the existence of an early non-scaling phase in the interface evolution. The length of this initial phase is a non-universal parameter, but its presence is universal. In application to parallel and distributed computations, the important consequence of the derived scaling is the existence of the upper bound for the desynchronization in a conservative update algorithm for parallel discrete-event simulations. It is shown that such algorithms are generally scalable in a ring communication topology.Comment: 16 pages, 16 figures, 77 reference

    Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems

    Full text link
    We present Swallow, a scalable many-core architecture, with a current configuration of 480 x 32-bit processors. Swallow is an open-source architecture, designed from the ground up to deliver scalable increases in usable computational power to allow experimentation with many-core applications and the operating systems that support them. Scalability is enabled by the creation of a tile-able system with a low-latency interconnect, featuring an attractive communication-to-computation ratio and the use of a distributed memory configuration. We analyse the energy and computational and communication performances of Swallow. The system provides 240GIPS with each core consuming 71--193mW, dependent on workload. Power consumption per instruction is lower than almost all systems of comparable scale. We also show how the use of a distributed operating system (nOS) allows the easy creation of scalable software to exploit Swallow's potential. Finally, we show two use case studies: modelling neurons and the overlay of shared memory on a distributed memory system.Comment: An open source release of the Swallow system design and code will follow and references to these will be added at a later dat

    QPACE 2 and Domain Decomposition on the Intel Xeon Phi

    Get PDF
    We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number of benchmarks.Comment: plenary talk at Lattice 2014, to appear in the conference proceedings PoS(LATTICE2014), 15 pages, 9 figure
    • …
    corecore