7 research outputs found

    Improved policy representation and policy search for proactive content caching in wireless networks

    Get PDF
    We study the problem of proactively pushing contents into a finite capacity cache memory of a user equipment in order to reduce the long-term average energy consumption in a wireless network. We consider an online social network (OSN) framework, in which new contents are generated over time and each content remains relevant to the user for a random time period, called the lifetime of the content. The user accesses the OSN through a wireless network at random time instants to download and consume all the relevant contents. Downloading contents has an energy cost that depends on the channel state and the number of downloaded contents. Our aim is to reduce the long-term average energy consumption by proactively caching contents at favorable channel conditions. In previous work, it was shown that the optimal caching policy is infeasible to compute (even with the complete knowledge of a stochastic model describing the system), and a simple family of threshold policies was introduced and optimised using the finite difference method. In this paper we improve upon both components of this approach: we use linear function approximation (LFA) to better approximate the considered family of caching policies, and apply the REINFORCE algorithm to optimise its parameters. Numerical simulations show that the new approach provides reduction in both the average energy cost and the running time for policy optimisation

    A reinforcement-learning approach to proactive caching in wireless networks.

    No full text
    We consider a mobile user accessing contents in a dynamic environment, where new contents are generated over time (by the user’s contacts), and remain relevant to the user for random lifetimes. The user, equipped with a finite-capacity cache memory, randomly accesses the system, and requests all the relevant contents at the time of access. The system incurs an energy cost associated with the number of contents downloaded and the channel quality at that time. Assuming causal knowledge of the channel quality, the content profile, and the user-access behavior, we model the proactive caching problem as a Markov decision process with the goal of minimizing the long-term average energy cost. We first prove the optimality of a threshold-based proactive caching scheme, which dynamically caches or removes appropriate contents from the memory, prior to being requested by the user, depending on the channel state. The optimal threshold values depend on the system state, and hence, are computationally intractable. Therefore, we propose parametric representations for the threshold values, and use reinforcement-learning algorithms to find near-optimal parametrizations. We demonstrate through simulations that the proposed schemes significantly outperform classical reactive downloading, and perform very close to a genieaided lower bound. Index Terms—Markov decision process, proactive content caching, policy gradient methods, reinforcement learning

    Reinforcement Learning for Proactive Caching of Contents with Different Demand Probabilities

    No full text
    A mobile user randomly accessing a dynamic content library over a wireless channel is considered. At each time instant, a random number of contents are added to the library and each content remains relevant to the user for a random period of time. Contents are classified into finitely many classes such that whenever the user accesses the system, he requests each content randomly with a class-specific demand probability. Contents are downloaded to the user equipment (UE) through a wireless link whose quality also varies randomly with time. The UE has a cache memory of finite capacity, which can be used to proactively store contents before they are requested by the user. Any time contents are downloaded, the system incurs a cost (energy, bandwidth, etc.) that depends on the channel state at the time of download, and scales linearly with the number of contents downloaded. Our goal is to minimize the expected long-term average cost. The problem is modeled as a Markov decision process, and the optimal policy is shown to exhibit a threshold structure; however, since finding the optimal policy is computationally infeasible, parametric approximations to the optimal policy are considered, whose parameters are optimized using the policy gradient method. Numerical simulations show that the performance gain of the resulting scheme over traditional reactive content delivery is significant, and increases with the cache capacity. Comparisons with two performance lower bounds, one computed based on infinite cache capacity and another based on non-casual knowledge of the user access times and content requests, demonstrate that our scheme can perform close to the theoretical optimum

    Multicast-Aware Proactive Caching in Wireless Networks with Deep Reinforcement Learning

    No full text
    We consider mobile users randomly requesting contents from a single dynamic content library. A random number of contents are added to the library at every time instant and each content has a lifetime, after which it becomes irrelevant to the users, and a class-specific request probability with which a user may request it. Multiple requests for a single content are served through a common multicast transmission. Contents can also be proactively stored, before they are requested, in finite capacity cache memories at the user equipment. Any time a content is transmitted to some users, a cost, which depends on the number of bits transmitted and the channel states of the receiving users at that time instant, is incurred by the system. The goal is to minimize the long term expected average cost. We model the problem as a Markov decision process and propose a deep reinforcement learning (DRL)-based policy to solve it. The DRL-based policy employs the deep deterministic policy gradient method for training to minimize the long term average cost. We evaluate the performance of the proposed scheme in comparison to traditional reactive multicast transmission and other multicast-aware caching schemes, and show that the proposed scheme provides significant performance gains
    corecore