When facing objects/files of differing sizes in content delivery networks
(CDNs) caches, pursuing an optimal object miss ratio (OMR) by approximating
Belady no longer ensures an optimal byte miss ratio (BMR), creating confusion
about how to achieve a superior BMR in CDNs. To address this issue, we
experimentally observe that there exists a time window to delay the eviction of
the object with the longest reuse distance to improve BMR without increasing
OMR. As a result, we introduce a deep reinforcement learning (RL) model to
capture this time window by dynamically monitoring the changes in OMR and BMR,
and implementing a BMR-friendly policy in the time window. Based on this
policy, we propose a Belady and Size Eviction (LRU-BaSE) algorithm, reducing
BMR while maintaining OMR. To make LRU-BaSE efficient and practical, we address
the feedback delay problem of RL with a two-pronged approach. On the one hand,
our observation of a rear section of the LRU cache queue containing most of the
eviction candidates allows LRU-BaSE to shorten the decision region. On the
other hand, the request distribution on CDNs makes it feasible to divide the
learning region into multiple sub-regions that are each learned with reduced
time and increased accuracy. In real CDN systems, compared to LRU, LRU-BaSE can
reduce "backing to OS" traffic and access latency by 30.05\% and 17.07\%,
respectively, on average. The results on the simulator confirm that LRU-BaSE
outperforms the state-of-the-art cache replacement policies, where LRU-BaSE's
BMR is 0.63\% and 0.33\% less than that of Belady and Practical Flow-based
Offline Optimal (PFOO), respectively, on average. In addition, compared to
Learning Relaxed Belady (LRB), LRU-BaSE can yield relatively stable performance
when facing workload drift