2 research outputs found

    Runtime 3-D Stacked Cache Management for Chip-Multiprocessors

    Get PDF
    These-dimensional (3-D) memory stacking is one of the most promising solutions to memory bandwidth problems in chip multiprocessors. In this work, we propose an efficient runtime 3-D cache management technique which takes advantage of the lower latencies through vertical interconnect as well as the runtime memory demand of applications which varies dynamically with time. Experimental results show that the proposed method offers performance improvement by up to 26.7% and on average 13.1% compared with the private cache organization

    Evaluation of the Memory Communication Traffic in a Hierarchical Cache Model for Massively-Manycore Processors

    Get PDF
    The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. A key enabler in manycore systems is the use of Networks-on-Chip (NoC) as a global communication mechanism. The adoption of NoCs in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. Many researchers have focused on direct communication between cores in the NoC, however in a manycore processor the communication is actually between the cores and the memory hierarchy. In this work, we investigate the memory communication traffic of shared threads in a hierarchical cache architecture. We argue that the performance scalability for shared-memory applications in a hierarchical cache architecture for systems with thousands of processor cores depends on the distance between threads sharing memory in terms of the cache hierarchy (the "memory distance"). We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies as a function of the "memory distance" between the threads. Our results using the ITRS physical data for 2023 show that the model of thread placement and the distance of placing them significantly affects the NoC performance, and that scale-invariant topologies perform better than flat topologies
    corecore