4,129 research outputs found

    Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

    Full text link
    Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), H\mathcal{H}_\infty control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.Comment: To Appear in Annual Review of Control, Robotics, and Autonomous System

    AI-Based Q-Learning Approach for Performance Optimization in MIMO-NOMA Wireless Communication Systems

    Get PDF
    In this paper, we investigate the performance enhancement of Multiple Input, Multiple Output, and Non-Orthogonal Multiple Access (MIMO-NOMA) wireless communication systems using an Artificial Intelligence (AI) based Q-Learning reinforcement learning approach. The primary challenge addressed is the optimization of power allocation in a MIMO-NOMA system, a complex task given the non-convex nature of the problem. Our proposed Q-Learning approach adaptively adjusts power allocation strategy for proximal and distant users, optimizing the trade-off between various conflicting metrics and significantly improving the system’s performance. Compared to traditional power allocation strategies, our approach showed superior performance across three principal parameters: spectral efficiency, achievable sum rate, and energy efficiency. Specifically, our methodology achieved approximately a 140% increase in the achievable sum rate and about 93% improvement in energy efficiency at a transmitted power of 20 dB while also enhancing spectral efficiency by approximately 88.6% at 30 dB transmitted Power. These results underscore the potential of reinforcement learning techniques, particularly Q-Learning, as practical solutions for complex optimization problems in wireless communication systems. Future research may investigate the inclusion of enhanced channel simulations and network limitations into the machine learning framework to assess the feasibility and resilience of such intelligent approaches
    corecore