28,635 research outputs found
Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access
We consider the problem of dynamic spectrum access for network utility
maximization in multichannel wireless networks. The shared bandwidth is divided
into K orthogonal channels. In the beginning of each time slot, each user
selects a channel and transmits a packet with a certain transmission
probability. After each time slot, each user that has transmitted a packet
receives a local observation indicating whether its packet was successfully
delivered or not (i.e., ACK signal). The objective is a multi-user strategy for
accessing the spectrum that maximizes a certain network utility in a
distributed manner without online coordination or message exchanges between
users. Obtaining an optimal solution for the spectrum access problem is
computationally expensive in general due to the large state space and partial
observability of the states. To tackle this problem, we develop a novel
distributed dynamic spectrum access algorithm based on deep multi-user
reinforcement leaning. Specifically, at each time slot, each user maps its
current state to spectrum access actions based on a trained deep-Q network used
to maximize the objective function. Game theoretic analysis of the system
dynamics is developed for establishing design principles for the implementation
of the algorithm. Experimental results demonstrate strong performance of the
algorithm.Comment: This work has been accepted for publication in the IEEE Transactions
on Wireless Communication
Pseudo-Rehearsal: Achieving Deep Reinforcement Learning without Catastrophic Forgetting
Neural networks can achieve excellent results in a wide variety of
applications. However, when they attempt to sequentially learn, they tend to
learn the new task while catastrophically forgetting previous ones. We propose
a model that overcomes catastrophic forgetting in sequential reinforcement
learning by combining ideas from continual learning in both the image
classification domain and the reinforcement learning domain. This model
features a dual memory system which separates continual learning from
reinforcement learning and a pseudo-rehearsal system that "recalls" items
representative of previous tasks via a deep generative network. Our model
sequentially learns Atari 2600 games while continuing to perform above human
level and equally well as independent models trained separately on each game.
This result is achieved without: demanding additional storage requirements as
the number of tasks increases, storing raw data or revisiting past tasks. In
comparison, previous state-of-the-art solutions are substantially more
vulnerable to forgetting on these complex deep reinforcement learning tasks
Jam Sessions: Analysis and Experimental Evaluation of Advanced Jamming Attacks in MIMO Networks
In advanced jamming, the adversary intentionally concentrates the available
energy budget on specific critical components (e.g., pilot symbols,
acknowledgement packets, etc.) to (i) increase the jamming effectiveness, as
more targets can be jammed with the same energy budget; and (ii) decrease the
likelihood of being detected, as the channel is jammed for a shorter period of
time. One of the fundamental challenges in designing defense mechanisms against
an advanced jammer is understanding which jamming strategies yields the lowest
throughput, for a given channel condition and a given amount of energy. To the
best of our knowledge, this problem still remains unsolved. To fill this gap,
in this paper we conduct a comparative analysis of several most viable advanced
jamming schemes in the widely-used MIMO networks. We first mathematically model
a number of advanced jamming schemes at the signal processing level, so that a
quantitative relationship between the jamming energy and the jamming effect is
established. Based on the model, theorems are derived on the optimal advanced
jamming scheme for an arbitrary channel condition. The theoretical findings are
validated through extensive simulations and experiments on a 5-radio 2x2 MIMO
testbed. Our results show that the theorems are able to predict jamming
efficiency with high accuracy. Moreover, we show that the theorems can be
incorporated to state-of-art reinforcement learning based jamming algorithms
and boost the action exploration phase so that a faster convergence is
achieved.Comment: To appear at ACM MobiHoc 2019, Catania, Ital
Toward Packet Routing with Fully-distributed Multi-agent Deep Reinforcement Learning
Packet routing is one of the fundamental problems in computer networks in
which a router determines the next-hop of each packet in the queue to get it as
quickly as possible to its destination. Reinforcement learning (RL) has been
introduced to design autonomous packet routing policies with local information
of stochastic packet arrival and service. However, the curse of dimensionality
of RL prohibits the more comprehensive representation of dynamic network
states, thus limiting its potential benefit. In this paper, we propose a novel
packet routing framework based on \emph{multi-agent} deep reinforcement
learning (DRL) in which each router possess an \emph{independent} LSTM
recurrent neural network for training and decision making in a \emph{fully
distributed} environment. The LSTM recurrent neural network extracts routing
features from rich information regarding backlogged packets and past actions,
and effectively approximates the value function of Q-learning. We further allow
each route to communicate periodically with direct neighbors so that a broader
view of network state can be incorporated. Experimental results manifest that
our multi-agent DRL policy can strike the delicate balance between
congestion-aware and shortest routes, and significantly reduce the packet
delivery time in general network topologies compared with its counterparts.Comment: 12 pages, 10 figure
Learning in the Machine: the Symmetries of the Deep Learning Channel
In a physical neural system, learning rules must be local both in space and
time. In order for learning to occur, non-local information must be
communicated to the deep synapses through a communication channel, the deep
learning channel. We identify several possible architectures for this learning
channel (Bidirectional, Conjoined, Twin, Distinct) and six symmetry challenges:
1) symmetry of architectures; 2) symmetry of weights; 3) symmetry of neurons;
4) symmetry of derivatives; 5) symmetry of processing; and 6) symmetry of
learning rules. Random backpropagation (RBP) addresses the second and third
symmetry, and some of its variations, such as skipped RBP (SRBP) address the
first and the fourth symmetry. Here we address the last two desirable
symmetries showing through simulations that they can be achieved and that the
learning channel is particularly robust to symmetry variations. Specifically,
random backpropagation and its variations can be performed with the same
non-linear neurons used in the main input-output forward channel, and the
connections in the learning channel can be adapted using the same algorithm
used in the forward channel, removing the need for any specialized hardware in
the learning channel. Finally, we provide mathematical results in simple cases
showing that the learning equations in the forward and backward channels
converge to fixed points, for almost any initial conditions. In symmetric
architectures, if the weights in both channels are small at initialization,
adaptation in both channels leads to weights that are essentially symmetric
during and after learning. Biological connections are discussed
Application of Machine Learning in Wireless Networks: Key Techniques and Open Issues
As a key technique for enabling artificial intelligence, machine learning
(ML) is capable of solving complex problems without explicit programming.
Motivated by its successful applications to many practical tasks like image
recognition, both industry and the research community have advocated the
applications of ML in wireless communication. This paper comprehensively
surveys the recent advances of the applications of ML in wireless
communication, which are classified as: resource management in the MAC layer,
networking and mobility management in the network layer, and localization in
the application layer. The applications in resource management further include
power control, spectrum management, backhaul management, cache management,
beamformer design and computation resource management, while ML based
networking focuses on the applications in clustering, base station switching
control, user association and routing. Moreover, literatures in each aspect is
organized according to the adopted ML techniques. In addition, several
conditions for applying ML to wireless communication are identified to help
readers decide whether to use ML and which kind of ML techniques to use, and
traditional approaches are also summarized together with their performance
comparison with ML based approaches, based on which the motivations of surveyed
literatures to adopt ML are clarified. Given the extensiveness of the research
area, challenges and unresolved issues are presented to facilitate future
studies, where ML based network slicing, infrastructure update to support ML
based paradigms, open data sets and platforms for researchers, theoretical
guidance for ML implementation and so on are discussed.Comment: 34 pages,8 figure
Towards Characterizing Divergence in Deep Q-Learning
Deep Q-Learning (DQL), a family of temporal difference algorithms for
control, employs three techniques collectively known as the `deadly triad' in
reinforcement learning: bootstrapping, off-policy learning, and function
approximation. Prior work has demonstrated that together these can lead to
divergence in Q-learning algorithms, but the conditions under which divergence
occurs are not well-understood. In this note, we give a simple analysis based
on a linear approximation to the Q-value updates, which we believe provides
insight into divergence under the deadly triad. The central point in our
analysis is to consider when the leading order approximation to the deep-Q
update is or is not a contraction in the sup norm. Based on this analysis, we
develop an algorithm which permits stable deep Q-learning for continuous
control without any of the tricks conventionally used (such as target networks,
adaptive gradient optimizers, or using multiple Q functions). We demonstrate
that our algorithm performs above or near state-of-the-art on standard MuJoCo
benchmarks from the OpenAI Gym
Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning
Heating, Ventilation, and Air Conditioning (HVAC) is extremely
energy-consuming, accounting for 40% of total building energy consumption.
Therefore, it is crucial to design some energy-efficient building thermal
control policies which can reduce the energy consumption of HVAC while
maintaining the comfort of the occupants. However, implementing such a policy
is challenging, because it involves various influencing factors in a building
environment, which are usually hard to model and may be different from case to
case. To address this challenge, we propose a deep reinforcement learning based
framework for energy optimization and thermal comfort control in smart
buildings. We formulate the building thermal control as a cost-minimization
problem which jointly considers the energy consumption of HVAC and the thermal
comfort of the occupants. To solve the problem, we first adopt a deep neural
network based approach for predicting the occupants' thermal comfort, and then
adopt Deep Deterministic Policy Gradients (DDPG) for learning the thermal
control policy. To evaluate the performance, we implement a building thermal
control simulation system and evaluate the performance under various settings.
The experiment results show that our method can improve the thermal comfort
prediction accuracy, and reduce the energy consumption of HVAC while improving
the occupants' thermal comfort
Deterministic Policy Gradients With General State Transitions
We study a reinforcement learning setting, where the state transition
function is a convex combination of a stochastic continuous function and a
deterministic function. Such a setting generalizes the widely-studied
stochastic state transition setting, namely the setting of deterministic policy
gradient (DPG).
We firstly give a simple example to illustrate that the deterministic policy
gradient may be infinite under deterministic state transitions, and introduce a
theoretical technique to prove the existence of the policy gradient in this
generalized setting. Using this technique, we prove that the deterministic
policy gradient indeed exists for a certain set of discount factors, and
further prove two conditions that guarantee the existence for all discount
factors. We then derive a closed form of the policy gradient whenever exists.
Furthermore, to overcome the challenge of high sample complexity of DPG in this
setting, we propose the Generalized Deterministic Policy Gradient (GDPG)
algorithm. The main innovation of the algorithm is a new method of applying
model-based techniques to the model-free algorithm, the deep deterministic
policy gradient algorithm (DDPG). GDPG optimize the long-term rewards of the
model-based augmented MDP subject to a constraint that the long-rewards of the
MDP is less than the original one.
We finally conduct extensive experiments comparing GDPG with state-of-the-art
methods and the direct model-based extension method of DDPG on several standard
continuous control benchmarks. Results demonstrate that GDPG substantially
outperforms DDPG, the model-based extension of DDPG and other baselines in
terms of both convergence and long-term rewards in most environments
Personalized Exposure Control Using Adaptive Metering and Reinforcement Learning
We propose a reinforcement learning approach for real-time exposure control
of a mobile camera that is personalizable. Our approach is based on Markov
Decision Process (MDP). In the camera viewfinder or live preview mode, given
the current frame, our system predicts the change in exposure so as to optimize
the trade-off among image quality, fast convergence, and minimal temporal
oscillation. We model the exposure prediction function as a fully convolutional
neural network that can be trained through Gaussian policy gradient in an
end-to-end fashion. As a result, our system can associate scene semantics with
exposure values; it can also be extended to personalize the exposure
adjustments for a user and device. We improve the learning performance by
incorporating an adaptive metering module that links semantics with exposure.
This adaptive metering module generalizes the conventional spot or matrix
metering techniques. We validate our system using the MIT FiveK and our own
datasets captured using iPhone 7 and Google Pixel. Experimental results show
that our system exhibits stable real-time behavior while improving visual
quality compared to what is achieved through native camera control.Comment: 17 pages, 20 figure
- …