1,540 research outputs found

    Rhymes: a shared virtual memory system for non-coherent tiled many-core architectures

    Get PDF
    The rising core count per processor is pushing chip complexity to a level that hardware-based cache coherency protocols become too hard and costly to scale. We need new designs of many-core hardware and software other than traditional technologies to keep up with the ever-increasing scalability demands. The Intel Single-chip Cloud Computer (SCC) is a recent research processor exemplifying a new cluster-on-chip architecture which promotes a software-oriented approach instead of hardware support to implementing shared memory coherence. This paper presents a shared virtual memory (SVM) system, dubbed Rhymes, tailored to such a new processor kind of non-coherent and hybrid memory architectures. Rhymes features a two-way cache coherence protocol to enforce release consistency for pages allocated in shared physical memory (SPM) and scope consistency for pages in per-core private memory. It also supports page remapping on a per-core basis to boost data locality. We implement Rhymes on the SCC port of the Barrelfish OS. Experimental results show that our SVM outperforms the pure SPM approach used by Intel's software managed coherence (SMC) library by up to 12 times, with superlinear speedups (due to L2 cache effect) noted for applications with strong data reuse patterns.published_or_final_versio

    Multithreaded self-scheduling: application of multithreading on loop scheduling for distributed shared memory multiprocessor

    Get PDF
    The 1st International Conference on Algorithms and Architectures for Parallel, Brisbane, Australia, 19-21 April 1995A new loop scheduling scheme called multithreaded self-scheduling (MSS) for distributed shared memory multiprocessor is proposed. Based on the principles of multithreading, MSS attempts to hide the remote memory access latencies by switching between multiple contexts of threads. Consequently, loops scheduled by using MSS can obtain better performance comparing to the single-thread approaches. In this paper, a series of simulation results corresponding to various parameter changes are presented, which provides a measure of the effectiveness of MSS under different boundary conditions and suggests the ways for further improvements.published_or_final_versio

    Reducing consistency traffic and cache misses in the avalanche multiprocessor

    Get PDF
    Journal ArticleFor a parallel architecture to scale effectively, communication latency between processors must be avoided. We have found that the source of a large number of avoidable cache misses is the use of hardwired write-invalidate coherency protocols, which often exhibit high cache miss rates due to excessive invalidations and subsequent reloading of shared data. In the Avalanche project at the University of Utah, we are building a 64-node multiprocessor designed to reduce the end-to-end communication latency of both shared memory and message passing programs. As part of our design efforts, we are evaluating the potential performance benefits and implementation complexity of providing hardware support for multiple coherency protocols. Using a detailed architecture simulation of Avalanche, we have found that support for multiple consistency protocols can reduce the time parallel applications spend stalled on memory operations by up to 66% and overall execution time by up to 31%. Most of this reduction in memory stall time is due to a novel release-consistent multiple-writer write-update protocol implemented using a write state buffer

    A novel smart energy management as a service over a cloud computing platform for nanogrid appliances

    Get PDF
    There will be a dearth of electrical energy in the world in the future due to exponential increase in electrical energy demand of rapidly growing world population. With the development of Internet of Things (IoT), more smart appliances will be integrated into homes in smart cities that actively participate in the electricity market by demand response programs to efficiently manage energy in order to meet this increasing energy demand. Thus, with this incitement, the energy management strategy using a price-based demand response program is developed for IoT-enabled residential buildings. We propose a new EMS for smart homes for IoT-enabled residential building smart devices by scheduling to minimize cost of electricity, alleviate peak-to-average ratio, correct power factor, automatic protective appliances, and maximize user comfort. In this method, every home appliance is interfaced with an IoT entity (a data acquisition module) with a specific IP address, which results in a wide wireless system of devices. There are two components of the proposed system: software and hardware. The hardware is composed of a base station unit (BSU) and many terminal units (TUs). The software comprises Wi-Fi network programming as well as system protocol. In this study, a message queue telemetry transportation (MQTT) broker was installed on the boards of BSU and TU. In this paper, we present a low-cost platform for the monitoring and helping decision making about different areas in a neighboring community for efficient management and maintenance, using information and communication technologies. The findings of the experiments demonstrated the feasibility and viability of the proposed method for energy management in various modes. The proposed method increases effective energy utilization, which in turn increases the sustainability of IoT-enabled homes in smart cities. The proposed strategy automatically responds to power factor correction, to protective home appliances, and to price-based demand response programs to combat the major problem of the demand response programs, which is the limitation of consumer’s knowledge to respond upon receiving demand response signals. The schedule controller proposed in this paper achieved an energy saving of 6.347 kWh real power per day, this paper achieved saving 7.282 kWh apparent power per day, and the proposed algorithm in our paper saved $2.3228388 per day

    Design of a communications interface for a very high performance computer

    Get PDF
    PetaFLOPS computing power is the newest goal of Federal Government agencies, in the increasingly active supercomputer field. To obtain this performance goal by the year 2007, sophisticated parallel processing designs are required. To effectively create network interfaces/routers for interprocessor communications in such computer systems, it requires optimal hardware and software codesigns. An interface is presented for the NJIT New Millennium Computing Point Design, a system that targets 100 TeraFLOPS performance by the year 2005. The router handles store-and-forward switching and wormhole routing for the system

    Dynamic adaptive parallel architecture integrates advanced technologies for petaflops-scale computing

    Get PDF
    Teraflops-scale computing systems are becoming available to an increasingly broad range of users as the performance of the constituent processing elements increases and their relative cost (e.g. per Mflops) decreases. To the original DOE ASCI Red machine has been added the ASCI Blue systems and additional 1 Teraflops commercial systems at key national centers. Clusters of low cost PCs employing COTS network technologies (e.g. Beowulf-class systems) will make peak Teraflops performance available for less than 2M in the near future for certain classes of well behaved problems. Future larger systems include the Japanese Earth Simulator with a peak performance of 40 Teraflops and three larger ASCI systems anticipated to provide peak performance of 10, 30, and 100 Teraflops culminating in 2005. These systems use existing or near term conventional technologies and architectures with some specialized integration logic and networking. While the peak performance goals can be satisfied through this strategy over the next decade, two major challenges confront the high performance computing community: (1) how to aggressively accelerate performance to the operational regime beyond a Petaflops, and (2) how to achieve high efficiency for a wide range of applications. The Hybrid Technology Multithreaded (HTMT) computer is under development by an interdisciplinary team of investigators to address both problems through an innovative combination of advanced technologies and dynamic adaptive architecture. This paper describes the strategy embodied by the HTMT architecture and discusses the key factors that may enable it to achieve two to three orders of magnitude performance with respect to today's largest systems at a cost and power consumption of only a factor of two to three times those same present day systems
    • …
    corecore