36 research outputs found

    A heterogeneous memory organization with minimum energy consumption in 3D chip-multiprocessors

    Get PDF
    Main memories play an important role in overall energy consumption of embedded systems. Using conventional memory technologies in future designs in nanoscale era cause a drastic increase in leakage power consumption and temperature-related problems. Emerging non-volatile memory (NVM) technologies offer many desirable characteristics such as near-zero leakage power, high density and non-volatility. They can significantly mitigate the issue of memory leakage power in future embedded chip-multiprocessor (eCMP) systems. However, they suffer from challenges such as limited write endurance and high write energy consumption which restrict them for adoption in modern memory systems. In this article, we propose a stacked hybrid memory system for 3D chip-multiprocessors to take advantages of both traditional and non-volatile memory technologies. For reaching this target, we present a convex optimization-based model that minimizes the system energy consumption while satisfy endurance constraint in order to design a reliable memory system. Experimental results show that the proposed method improves energy-delay product (EDP) and performance by about 44.8% and 13.8% on average respectively compared with the traditional memory design where single technology is used. © 2016 IEEE

    Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach

    Get PDF
    Energy consumption becomes the most critical limitation on the performance of nowadays embedded system designs. On-chip memories due to major contribution in overall system energy consumption are always significant issue for embedded systems. Using conventional memory technologies in future designs in nano-scale era causes a drastic increase in leakage power consumption and temperature-related problems. Emerging non-volatile memory (NVM) technologies are promising replacement for conventional memory structure in embedded systems due to its attractive characteristics such as near-zero leakage power, high density and non-volatility. Recent advantages of NVM technologies can significantly mitigate the issue of memory leakage power. However, they introduce new challenges such as limited write endurance and high write energy consumption which restrict them for adoption in modern memory systems. In this article, we propose a stacked hybrid memory system to minimize energy consumption for 3D embedded chip-multiprocessors (eCMP). For reaching this target, we present a convex optimization-based model to distribute data blocks between SRAM and NVM banks based on data access pattern derived by compiler. Our compiler-assisted hybrid memory architecture can achieve up to 51.28 times improvement in lifetime. In addition, experimental results show that our proposed method reduce energy consumption by 56% on average compared to the traditional memory design where single technology is used. © 2015 IEEE

    Throughput-Efficient Network-on-Chip Router Design with STT-MRAM

    Get PDF
    As the number of processor cores on a chip increases with the advance of CMOS technology, there has been a growing need of more efficient Network-on-Chip (NoC) design since communication delay has become a major bottleneck in large-scale multicore systems. In designing efficient input buffers of NoC routers for better performance and power efficiency, Spin-Torque Transfer Magnetic RAM (STT-MRAM) is regarded as a promising solution due to its nature of high density and near-zero leakage power. Previous work that adopts STT-MRAM in designing NoC router input buffer shows a limitation in minimizing the overhead of power consumption, even though it succeeds to some degree in achieving high network throughput by the use of SRAM to hide the long write latency of STT-MRAM. In this thesis, we propose a novel input buffer design that depends solely on STT-MRAM without the need of SRAM to maximize the benefits of low leakage power and area efficiency inherent in STT-MRAM. In addition, we introduce power-efficient buffer refreshing schemes synergized with age-based switch arbitration that gives higher priority to older flits to remove unnecessary refreshing operations. On an average, we observed throughput improvements of 16% on synthetic workloads and benchmarks

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students

    Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

    Get PDF
    High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%

    STT-MRAM Based NoC Buffer Design

    Get PDF
    As Chip Multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) is a major bottleneck in CMP design. An emerging non-volatile memory - STT MRAM (Spin-Torque Transfer Magnetic RAM) which provides substantial power and area savings, near zero leakage power, and displays higher memory density compared to conventional SRAM. But STT-MRAM suffers from inherit drawbacks like multi cycle write latency and high write power consumption. So, these problem have to addressed in order to have an efficient design to incorporate STT-MRAM for NoC input buffer instead of traditional SRAM based input buffer design. Motivated by short intra-router latency, previously proposed write latency reduction technique is explored by sacrificing retention time and a hybrid design of input buffers using both SRAM and STT-MRAM to "hide" the long write latency efficiently is proposed. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer is also proposed

    OptMem: Dark-silicon aware low latency hybrid memory design

    Get PDF
    In this article, we present a convex optimization model to design a three dimension (3D)stacked hybrid memory system to improve performance in the dark silicon era. Our convex model optimizes numbers and placement of static random access memory (SRAM) and spin-Transfer torque magnetic random-Access memory(STT-RAM) memories on the memory layer to exploit advantages of both technologies. Power consumption that is the main challenge in the dark silicon era is represented as a main constraint in this work and it is satisfied by the detailed optimization model in order to design a dark silicon aware 3D Chip-Multiprocessor (CMP). Experimental results show that the proposed architecture improves the energy consumption and performanceof the 3D CMPabout 25.8% and 12.9% on averagecompared to the Baseline memory design. © 2016 IEEE

    High Performance On-Chip Interconnects Design for Future Many-Core Architectures

    Get PDF
    Switch-based Network-on-Chip (NoC) is a widely accepted inter-core communication infrastructure for Chip Multiprocessors (CMPs). With the continued advance of CMOS technology, the number of cores on a single chip keeps increasing at a rapid pace. It is highly expected that many-core architectures with more than hundreds of processor cores will be commercialized in the near future. In such a large scale CMP system, NoC overheads are more dominant than computation power in determining overall system performance. Also, for modern computational workloads requiring abundant thread level parallelism (TLP), NoC design for highly-parallel, many-core accelerators such as General Purpose Graphics Processing Units (GPGPUs) is of primary importance in harnessing the potential of massive thread- and data-level parallelism. In these contexts, it is critical that NoC provides both low latency and high bandwidth within limited on-chip power and area budgets. In this dissertation, we explore various design issues inherent in future many-core architectures, CMPs and GPGPUs, to achieve both high performance and power efficiency. First, we deal with issues in using a promising next generation memory technology, Spin-Transfer Torque Magnetic RAM (STT-MRAM), for NoC input buffers in CMPs. Using a high density and low leakage memory offers more buffer capacities with the same die footprint, thus helping increase network throughput in NoC routers. However, its long latency and high power consumption in write operations still need to be addressed. Thus, we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Second, we propose the first NoC router design that uses only STT-MRAM, providing much larger buffer space with less power consumption, while preserving data integrity. To hide the multicycle writes, we employ a multibank STT-MRAM buffer, a virtual channel with multiple banks where every incoming flit is seamlessly pipelined to each bank alternately. Our STT-MRAM design has aggressively reduced the retention time, resulting in a significant reduction in the latency and power overheads of write operations. To ensure data integrity against inadvertent bit flips from the thermal fluctuation during the given retention time, we propose a cost-efficient dynamic buffer refresh scheme combined with Error Correcting Codes (ECC) to detect and correct data corruption. Third, we present schemes for bandwidth-efficient on-chip interconnects in GPGPUs. GPGPUs place a heavy demand on the on-chip interconnect between the many cores and a few memory controllers (MCs). Thus, traffic is highly asymmetric, impacting on-chip resource utilization and system performance. Here, we analyze the communication demands of typical GPGPU applications, and propose efficient NoC designs to meet those demands

    Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

    Get PDF
    High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%
    corecore