58 research outputs found

    Weight Clipping for Deep Continual and Reinforcement Learning

    Full text link
    Many failures in deep continual and reinforcement learning are associated with increasing magnitudes of the weights, making them hard to change and potentially causing overfitting. While many methods address these learning failures, they often change the optimizer or the architecture, a complexity that hinders widespread adoption in various systems. In this paper, we focus on learning failures that are associated with increasing weight norm and we propose a simple technique that can be easily added on top of existing learning systems: clipping neural network weights to limit them to a specific range. We study the effectiveness of weight clipping in a series of supervised and reinforcement learning experiments. Our empirical results highlight the benefits of weight clipping for generalization, addressing loss of plasticity and policy collapse, and facilitating learning with a large replay ratio.Comment: Published in the First Reinforcement Learning Conference (RLC 2024). Code is available at https://github.com/mohmdelsayed/weight-clippin

    Learning to Optimize for Reinforcement Learning

    Full text link
    In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning, outperforming classical hand-designed optimizers. Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learned optimizers do not work well even in simple RL tasks. We investigate this phenomenon and identify two issues. First, the agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. Moreover, due to highly stochastic agent-environment interactions, the agent-gradients have high bias and variance, which increases the difficulty of learning an optimizer for RL. We propose pipeline training and a novel optimizer structure with a good inductive bias to address these issues, making it possible to learn an optimizer for reinforcement learning from scratch. We show that, although only trained in toy tasks, our learned optimizer can generalize to unseen complex tasks in Brax.Comment: Published at RLC 2024. For code release, see https://github.com/sail-sg/optim4r

    Continuous Beam Steering Through Broadside Using Asymmetrically Modulated Goubau Line Leaky-Wave Antennas

    Get PDF
    Goubau line is a single-conductor transmission line, featuring easy integration and low-loss transmission properties. Here, we propose a periodic leaky-wave antenna (LWA) based on planar Goubau transmission line on a thin dielectric substrate. The leaky-wave radiations are generated by introducing periodic modulations along the Goubau line. In this way, the surface wave, which is slow-wave mode supported by the Goubau line, achieves an additional momentum and hence enters the fast-wave region for radiations. By employing the periodic modulations, the proposed Goubau line LWAs are able to continuously steer the main beam from backward to forward within the operational frequency range. However, the LWAs usually suffer from a low radiation efficiency at the broadside direction. To overcome this drawback, we explore both transversally and longitudinally asymmetrical modulations to the Goubau line. Theoretical analysis, numerical simulations and experimental results are given in comparison with the symmetrical LWAs. It is demonstrated that the asymmetrical modulations significantly improve the radiation efficiency of LWAs at the broadside. Furthermore, the measurement results agree well with the numerical ones, which experimentally validates the proposed LWA structures. These novel Goubau line LWAs, experimentally demonstrated and validated at microwave frequencies, show also great potential for millimeter-wave and terahertz systems

    Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

    Full text link
    We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of O~(d3/2H5/2T)\tilde{O}(d^{3/2}H^{5/2}\sqrt{T}), where dd is the dimension of the feature mapping, HH is the planning horizon, and TT is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite

    Capacitor-Loaded Spoof Surface Plasmon for Flexible Dispersion Control and High-Selectivity Filtering

    Get PDF
    This letter proposes a new spoof surface plasmon transmission line (SSP-TL) using capacitor loading technique. This new SSP-TL features flexible and reconfigurable dispersion control and highly selective filtering performance without resorting to configuration change. Moreover, it requires much smaller linewidth than the conventional SSP-TL for achieving an extremely slow wave (or a highly confined field), which is quite useful for a compact system. To illustrate the design principle, several examples are designed within the frequency range of 2-8 GHz. Both numerical and experimental results are given in comparison with the conventional SSP-TL. It is demonstrated that the proposed technique provides a better performance in size reduction and dispersion reconfigurability
    corecore