4 research outputs found

    A Swap-based Cache Set Index Scheme to Leverage both Superpage and Page Coloring Optimizations

    Full text link

    Increasing TLB reach using TCAM cells

    Get PDF
    We propose dynamic aggregation of virtual tags in TLB to increase its coverage and improve the overall miss ratio during address translation. Dynamic aggregation exploits both the spatial and temporal locality inherent in most application programs. To support dynamic aggregation, we introduce the use of ternary-CAM (TCAM) cells at the second-level TLB. The modified TLB architecture results in an increase of TLB reach without additional CAM entries. We also adopt bulk prefetching concurrently with aggregation technique to enhance the benefits due to spatial locality. The performance of the proposed TLB architecture is evaluated using SPEC2000 benchmarks concentrating on those that show high data TLB miss ratios. Simulation results indicate a reduction in miss ratios between 59% and 99.99% for all the considered bench-marks except for one benchmark, which has a reduction of 10%. We show that the L2 TLB when enhanced using TCAM cells is an attractive solution to high miss ratios exhibited by applications

    An energy efficient TCAM enhanced cache architecture

    Get PDF
    Microprocessors are used in a variety of systems ranging from high-performance super computers running scientific applications to battery powered cell phones performing realtime tasks. Due to the large disparity between processor clock speed and main memory access time, most modern processors include several caches, which consume more than half of the total chip area and power budget. As the performance gap between processors and memory has increased, the trend has been to increase the size of the on-chip caches. However, increasing the cache size also increases its access time and energy consumptions. This growing power dissipation problem is making traditional cooling and packaging techniques less effective thus requiring cache designers to focus more on architectural level energy efficiency than performance alone. The goal of this thesis is to propose a new cache architecture and to evaluate its efficiency in terms of miss rate, system performance, energy consumption, and area overhead. The proposed architecture employs the use of a few Ternary-CAM (TCAM) cells in the tag array to enable dynamic compression of tag entries containing contiguous values. By dynamically compressing tag entries, the number of entries in the tag array can be reduced by 2N, where N is the number of tag bits that can be compressed. The architecture described in this thesis is applicable to any cache structure that uses Content Addressable Memory (CAM) cells to store tag bits. To evaluate the effectiveness of the TCAM Enhanced Cache Architecture for a wide scope of applications, two case studies were performed ?? the L2 Data-TLB (DTLB) of a high-performance processor and the L1 instruction and data caches of a low-power embedded processor. Results indicate that a L2 DTLB implementing 3-bit tag compression can achieve 93% of the performance of a conventional L2 DTLB of the same size while reducing the on-chip energy consumption by 74% and the total area by 50%. Similarly, an embedded processor cache implementing 2-bit tag compression achieves 99% of the performance of a conventional cache while reducing the on-chip energy consumption by 33% and the total area by 10%

    Reevaluating online superpage promotion with hardware support

    No full text
    fipical translation lookaside buffers (TLBs) can map a far smaller region of memory than application foot-prints demand, and the cost of handling TLB misses therefore limits the performance of an increasing num-ber of applications. This bottleneck can be mitigated by the use of superpages, multiple adjacent virtual mem-ory pages that can be mapped with a single TLB en-try, that extend TLB reach without significantly increas-ing size or cost. We analyze hardware/sofrware trade-offs for dynamically creating superpages. This study ex-tends previous work by using execution-driven simula-tion to compare creating superpages via copying with remapping pages within the nienioiy controller, and by exaniining how the tradeoffs change when moving from a single-issue to a superscalar processor model. We find that remapping-based promotion outperforms copying-based promotion. ofren signifcantly. Copying-based promotion is slightly more effective on superscalar pro-cessors than on single-issue processors, and the relative performance of remapping-based proniotion on the two platforms is application-dependent.
    corecore