9,431 research outputs found

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture

    Get PDF
    Abstract. We believe that future many-core architectures should support a simple and scalable way to execute many threads that are generated by parallel programs. A good candidate to implement an efficient and scalable execution of threads is the DTA (Decoupled Threaded Architecture), which is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a hardware scheduling unit and relying on existing simple cores. In this paper, we present an initial implementation of DTA concept in a many-core architecture where it interacts with other architectural components designed from scratch in order to address the problem of scalability. We present initial results that show the scalability of the solution that were obtained using a many-core simulator written in SARCSim (a variant of UNISIM) with DTA support

    OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

    Full text link
    There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture offers high computational energy efficiency for integer and floating point calculations as well as parallel scalability. The Epiphany-III is available as a coprocessor in platforms that also utilize an ARM CPU host. OpenCL provides good functionality for supporting a co-design programming model in which the host CPU offloads parallel work to a coprocessor. However, the OpenCL memory model is inconsistent with the Epiphany memory architecture and lacks support for inter-core communication. We propose a hybrid programming model in which OpenSHMEM provides a better solution by replacing the non-standard OpenCL extensions introduced to achieve high performance with the Epiphany architecture. We demonstrate the proposed programming model for matrix-matrix multiplication based on Cannon's algorithm showing that the hybrid model addresses the deficiencies of using OpenCL alone to achieve good benchmark performance.Comment: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

    Scalable Synchronization with Mindicators

    Get PDF
    The Mindicator is a shared object that stores one value for each thread in a system, and can return the minimum of all thread’s values in constant time. In this paper, we explore applications of the Mindicator in synchronization algorithms. We introduce three new algorithms, designed for scalable Read-Copy-Update (RCU), fair Readers-Writer locking, and Group Mutual Exclusion. Experimental evaluation shows these algorithms to perform well while avoiding contention
    • …
    corecore