Content caching at the edge nodes is a promising technique to reduce the data
traffic in next-generation wireless networks. Inspired by the success of Deep
Reinforcement Learning (DRL) in solving complicated control problems, this work
presents a DRL-based framework with Wolpertinger architecture for content
caching at the base station. The proposed framework is aimed at maximizing the
long-term cache hit rate, and it requires no knowledge of the content
popularity distribution. To evaluate the proposed framework, we compare the
performance with other caching algorithms, including Least Recently Used (LRU),
Least Frequently Used (LFU), and First-In First-Out (FIFO) caching strategies.
Meanwhile, since the Wolpertinger architecture can effectively limit the action
space size, we also compare the performance with Deep Q-Network to identify the
impact of dropping a portion of the actions. Our results show that the proposed
framework can achieve improved short-term cache hit rate and improved and
stable long-term cache hit rate in comparison with LRU, LFU, and FIFO schemes.
Additionally, the performance is shown to be competitive in comparison to Deep
Q-learning, while the proposed framework can provide significant savings in
runtime.Comment: 6 pages, 3 figure