2 research outputs found
Analytical Modeling the Multi-Core Shared Cache Behavior with Considerations of Data-Sharing and Coherence
To mitigate the ever worsening Power wall and Memory wall problems,
multi-core architectures with multilevel cache hierarchies have been widely
used in modern processors. However, the complexity of the architectures makes
the modeling of shared caches extremely complex. In this paper, we propose a
data-sharing aware analytical model for estimating the miss rates of the
downstream shared cache in a multi-core environment. Moreover, the proposed
model can also be integrated with upstream cache analytical models with the
consideration of multi-core private cache coherent effects. The integration
avoids time-consuming full simulations of the cache architecture, which are
required by conventional approaches. We validate our analytical model against
gem5 simulation results under 13 applications from PARSEC 2.1 benchmark suites.
We compare the L2 cache miss rates with the results from gem5 under 8 hardware
configurations including dual-core and quad-core architectures. The average
absolute error is less than 2% for all configurations. After integrated with
the upstream model, the overall average absolute error is 8.03% in 4 hardware
configurations. As an application case of the integrated model, we also
evaluate the miss rates of 57 different cache configurations in multi-core and
multi-level cache scenarios.Comment: The manuscript has been submitted to Microprocessors and Microsystem
Fast Modeling L2 Cache Reuse Distance Histograms Using Combined Locality Information from Software Traces
To mitigate the performance gap between CPU and the main memory, multi-level
cache architectures are widely used in modern processors. Therefore, modeling
the behaviors of the downstream caches becomes a critical part of the processor
performance evaluation in the early stage of Design Space Exploration (DSE). In
this paper, we propose a fast and accurate L2 cache reuse distance histogram
model, which can be used to predict the behaviors of the multi-level cache
architectures where the L1 cache uses the LRU replacement policy and the L2
cache uses LRU/Random replacement policies. We use the profiled L1 reuse
distance histogram and two newly proposed metrics, namely the RST table and the
Hit-RDH, that describing more detailed information of the software traces as
the inputs. For a given L1 cache configuration, the profiling results can be
reused for different configurations of the L2 cache. The output of our model is
the L2 cache reuse distance histogram, based on which the L2 cache miss rates
can be evaluated. We compare the L2 cache miss rates with the results from gem5
cycle-accurate simulations of 15 benchmarks chosen from SPEC CPU 2006 and 9
benchmarks from SPEC CPU 2017. The average absolute error is less than 5%,
while the evaluation time for each L2 configuration can be sped up almost 30X
for four L2 cache candidates.Comment: This manuscript has been major revised and re-submitted to Journal of
Systems Architectur