1,216 research outputs found

    Finite-time analysis of single-timescale actor-critic

    Full text link
    Actor-critic methods have achieved significant success in many challenging applications. However, its finite-time convergence is still poorly understood in the most practical single-timescale form. Existing works on analyzing single-timescale actor-critic have been limited to i.i.d. sampling or tabular setting for simplicity. We investigate the more practical online single-timescale actor-critic algorithm on continuous state space, where the critic assumes linear function approximation and updates with a single Markovian sample per actor step. Previous analysis has been unable to establish the convergence for such a challenging scenario. We demonstrate that the online single-timescale actor-critic method provably finds an ϵ\epsilon-approximate stationary point with O~(ϵ2)\widetilde{\mathcal{O}}(\epsilon^{-2}) sample complexity under standard assumptions, which can be further improved to O(ϵ2)\mathcal{O}(\epsilon^{-2}) under the i.i.d. sampling. Our novel framework systematically evaluates and controls the error propagation between the actor and critic. It offers a promising approach for analyzing other single-timescale reinforcement learning algorithms as well

    Heterogeneous Federated Learning on a Graph

    Full text link
    Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure GG exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We encode the distribution heterogeneity by parametrizing distributions on local devices with a set of distinct pp-dimensional vectors. We then propose to jointly estimate parameters of all devices under the MM-estimation framework with the fused Lasso regularization, encouraging an equal estimate of parameters on connected devices in GG. We provide a general result for our estimator depending on GG, which can be further calibrated to obtain convergence rates for various specific problem setups. Surprisingly, our estimator attains the optimal rate under certain graph fidelity condition on GG, as if we could aggregate all samples sharing the same distribution. If the graph fidelity condition is not met, we propose an edge selection procedure via multiple testing to ensure the optimality. To ease the burden of local computation, a decentralized stochastic version of ADMM is provided, with convergence rate O(T1logT)O(T^{-1}\log T) where TT denotes the number of iterations. We highlight that, our algorithm transmits only parameters along edges of GG at each iteration, without requiring a central machine, which preserves privacy. We further extend it to the case where devices are randomly inaccessible during the training process, with a similar algorithmic convergence guarantee. The computational and statistical efficiency of our method is evidenced by simulation experiments and the 2020 US presidential election data set.Comment: 61 pages, 4 figure

    TDLE: 2-D LiDAR Exploration With Hierarchical Planning Using Regional Division

    Full text link
    Exploration systems are critical for enhancing the autonomy of robots. Due to the unpredictability of the future planning space, existing methods either adopt an inefficient greedy strategy or require a lot of resources to obtain a global solution. In this work, we address the challenge of obtaining global exploration routes with minimal computing resources. A hierarchical planning framework dynamically divides the planning space into subregions and arranges their orders to provide global guidance for exploration. Indicators that are compatible with the subregion order are used to choose specific exploration targets, thereby considering estimates of spatial structure and extending the planning space to unknown regions. Extensive simulations and field tests demonstrate the efficacy of our method in comparison to existing 2D LiDAR-based approaches. Our code has been made public for further investigation.Comment: Accepted in IEEE International Conference on Automation Science and Engineering (CASE) 202

    Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

    Full text link
    In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning

    Determination of parameters of a potential model for tetraquark study by studying all S-wave mesons

    Full text link
    The masses of low-lying S-wave mesons are evaluated in a constituent quark model (CQM) where the Cornell-like potential and one-gluon exchange spin-spin interaction are employed. To make the model applicable to both the light and heavy quark sectors, we introduce mass-dependent coupling coefficients. There are four free parameters in the model, which are determined by comparing the theoretical results with experimental data. The established model with one set of parameters may be applied to study higher excited meson states as well as multiquark systems in both the light and heavy quark sectors
    corecore