16 research outputs found
Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures
Many-core architectures of the future are likely to have distributed memory
organizations and need fine grained concurrency management to be used
effectively. The Self-adaptive Virtual Processor (SVP) is an abstract
concurrent programming model which can provide this, but the model and its
current implementations assume a single address space shared memory. We
investigate and extend SVP to handle distributed environments, and discuss a
prototype SVP implementation which transparently supports execution on
heterogeneous distributed memory clusters over TCP/IP connections, while
retaining the original SVP programming model
Experiences in porting the SVP concurrency model to the 48-core Intel SCC using dedicated copy cores
A Characterization of the SPARC T3-4 System
This technical report covers a set of experiments on the 64-core SPARC T3-4
system, comparing it to two similar AMD and Intel systems. Key characteristics
as maximum integer and floating point arithmetic throughput are measured as
well as memory throughput, showing the scalability of the SPARC T3-4 system.
The performance of POSIX threads primitives is characterized and compared in
detail, such as thread creation and mutex synchronization. Scalability tests
with a fine grained multithreaded runtime are performed, showing problems with
atomic CAS operations on such physically highly parallel systems
Efficient memory copy operations on the 48-core intel scc processor,”
Abstract-The Single-chip Cloud Computer (SCC) is a 48-core experimental processor created by Intel Labs targeting the many-core research community. It has hardware support for sending short messages between cores, while large messages have to go through off-chip shared memory. However, memory copy operations on this chip are expensive and inefficient. In this paper we provide insight in the SCC's memory architecture and describe and evaluate a few memory copy methods. We propose a novel method, unique to the SCC, which we believe achieves the maximum possible throughput for a single core on this chip. In order to efficiently implement this approach we introduce dedicated cores that run a memory copy service which can be used asynchronously by other cores