58 research outputs found
Weight Clipping for Deep Continual and Reinforcement Learning
Many failures in deep continual and reinforcement learning are associated
with increasing magnitudes of the weights, making them hard to change and
potentially causing overfitting. While many methods address these learning
failures, they often change the optimizer or the architecture, a complexity
that hinders widespread adoption in various systems. In this paper, we focus on
learning failures that are associated with increasing weight norm and we
propose a simple technique that can be easily added on top of existing learning
systems: clipping neural network weights to limit them to a specific range. We
study the effectiveness of weight clipping in a series of supervised and
reinforcement learning experiments. Our empirical results highlight the
benefits of weight clipping for generalization, addressing loss of plasticity
and policy collapse, and facilitating learning with a large replay ratio.Comment: Published in the First Reinforcement Learning Conference (RLC 2024).
Code is available at https://github.com/mohmdelsayed/weight-clippin
Learning to Optimize for Reinforcement Learning
In recent years, by leveraging more data, computation, and diverse tasks,
learned optimizers have achieved remarkable success in supervised learning,
outperforming classical hand-designed optimizers. Reinforcement learning (RL)
is essentially different from supervised learning, and in practice, these
learned optimizers do not work well even in simple RL tasks. We investigate
this phenomenon and identify two issues. First, the agent-gradient distribution
is non-independent and identically distributed, leading to inefficient
meta-training. Moreover, due to highly stochastic agent-environment
interactions, the agent-gradients have high bias and variance, which increases
the difficulty of learning an optimizer for RL. We propose pipeline training
and a novel optimizer structure with a good inductive bias to address these
issues, making it possible to learn an optimizer for reinforcement learning
from scratch. We show that, although only trained in toy tasks, our learned
optimizer can generalize to unseen complex tasks in Brax.Comment: Published at RLC 2024. For code release, see
https://github.com/sail-sg/optim4r
Continuous Beam Steering Through Broadside Using Asymmetrically Modulated Goubau Line Leaky-Wave Antennas
Goubau line is a single-conductor transmission line, featuring easy integration and low-loss transmission properties. Here, we propose a periodic leaky-wave antenna (LWA) based on planar Goubau transmission line on a thin dielectric substrate. The leaky-wave radiations are generated by introducing periodic modulations along the Goubau line. In this way, the surface wave, which is slow-wave mode supported by the Goubau line, achieves an additional momentum and hence enters the fast-wave region for radiations. By employing the periodic modulations, the proposed Goubau line LWAs are able to continuously steer the main beam from backward to forward within the operational frequency range. However, the LWAs usually suffer from a low radiation efficiency at the broadside direction. To overcome this drawback, we explore both transversally and longitudinally asymmetrical modulations to the Goubau line. Theoretical analysis, numerical simulations and experimental results are given in comparison with the symmetrical LWAs. It is demonstrated that the asymmetrical modulations significantly improve the radiation efficiency of LWAs at the broadside. Furthermore, the measurement results agree well with the numerical ones, which experimentally validates the proposed LWA structures. These novel Goubau line LWAs, experimentally demonstrated and validated at microwave frequencies, show also great potential for millimeter-wave and terahertz systems
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
We present a scalable and effective exploration strategy based on Thompson
sampling for reinforcement learning (RL). One of the key shortcomings of
existing Thompson sampling algorithms is the need to perform a Gaussian
approximation of the posterior distribution, which is not a good surrogate in
most practical settings. We instead directly sample the Q function from its
posterior distribution, by using Langevin Monte Carlo, an efficient type of
Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy
gradient descent updates to learn the exact posterior distribution of the Q
function, which makes our approach easy to deploy in deep RL. We provide a
rigorous theoretical analysis for the proposed method and demonstrate that, in
the linear Markov decision process (linear MDP) setting, it has a regret bound
of , where is the dimension of the
feature mapping, is the planning horizon, and is the total number of
steps. We apply this approach to deep RL, by using Adam optimizer to perform
gradient updates. Our approach achieves better or similar results compared with
state-of-the-art deep RL algorithms on several challenging exploration tasks
from the Atari57 suite
Capacitor-Loaded Spoof Surface Plasmon for Flexible Dispersion Control and High-Selectivity Filtering
This letter proposes a new spoof surface plasmon transmission line (SSP-TL) using capacitor loading technique. This new SSP-TL features flexible and reconfigurable dispersion control and highly selective filtering performance without resorting to configuration change. Moreover, it requires much smaller linewidth than the conventional SSP-TL for achieving an extremely slow wave (or a highly confined field), which is quite useful for a compact system. To illustrate the design principle, several examples are designed within the frequency range of 2-8 GHz. Both numerical and experimental results are given in comparison with the conventional SSP-TL. It is demonstrated that the proposed technique provides a better performance in size reduction and dispersion reconfigurability
Hepatic arterial phase and portal venous phase computed tomography for dose calculation of stereotactic body radiation therapy plans in liver cancer: a dosimetric comparison study
Grouping miRNAs of similar functions via weighted information content of gene ontology
- …