31,028 research outputs found
Deep Reinforcement Learning for Real-Time Optimization in NB-IoT Networks
NarrowBand-Internet of Things (NB-IoT) is an emerging cellular-based
technology that offers a range of flexible configurations for massive IoT radio
access from groups of devices with heterogeneous requirements. A configuration
specifies the amount of radio resource allocated to each group of devices for
random access and for data transmission. Assuming no knowledge of the traffic
statistics, there exists an important challenge in "how to determine the
configuration that maximizes the long-term average number of served IoT devices
at each Transmission Time Interval (TTI) in an online fashion". Given the
complexity of searching for optimal configuration, we first develop real-time
configuration selection based on the tabular Q-learning (tabular-Q), the Linear
Approximation based Q-learning (LA-Q), and the Deep Neural Network based
Q-learning (DQN) in the single-parameter single-group scenario. Our results
show that the proposed reinforcement learning based approaches considerably
outperform the conventional heuristic approaches based on load estimation
(LE-URC) in terms of the number of served IoT devices. This result also
indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve
almost the same performance with much less training time. We further advance
LA-Q and DQN via Actions Aggregation (AA-LA-Q and AA-DQN) and via Cooperative
Multi-Agent learning (CMA-DQN) for the multi-parameter multi-group scenario,
thereby solve the problem that Q-learning agents do not converge in
high-dimensional configurations. In this scenario, the superiority of the
proposed Q-learning approaches over the conventional LE-URC approach
significantly improves with the increase of configuration dimensions, and the
CMA-DQN approach outperforms the other approaches in both throughput and
training efficiency
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems
In many Cyber-Physical Systems, we encounter the problem of remote state
estimation of geographically distributed and remote physical processes. This
paper studies the scheduling of sensor transmissions to estimate the states of
multiple remote, dynamic processes. Information from the different sensors have
to be transmitted to a central gateway over a wireless network for monitoring
purposes, where typically fewer wireless channels are available than there are
processes to be monitored. For effective estimation at the gateway, the sensors
need to be scheduled appropriately, i.e., at each time instant one needs to
decide which sensors have network access and which ones do not. To address this
scheduling problem, we formulate an associated Markov decision process (MDP).
This MDP is then solved using a Deep Q-Network, a recent deep reinforcement
learning algorithm that is at once scalable and model-free. We compare our
scheduling algorithm to popular scheduling algorithms such as round-robin and
reduced-waiting-time, among others. Our algorithm is shown to significantly
outperform these algorithms for many example scenarios
- …