1,358 research outputs found

    Finite-time analysis of single-timescale actor-critic

    Full text link
    Actor-critic methods have achieved significant success in many challenging applications. However, its finite-time convergence is still poorly understood in the most practical single-timescale form. Existing works on analyzing single-timescale actor-critic have been limited to i.i.d. sampling or tabular setting for simplicity. We investigate the more practical online single-timescale actor-critic algorithm on continuous state space, where the critic assumes linear function approximation and updates with a single Markovian sample per actor step. Previous analysis has been unable to establish the convergence for such a challenging scenario. We demonstrate that the online single-timescale actor-critic method provably finds an ϵ\epsilon-approximate stationary point with O~(ϵ2)\widetilde{\mathcal{O}}(\epsilon^{-2}) sample complexity under standard assumptions, which can be further improved to O(ϵ2)\mathcal{O}(\epsilon^{-2}) under the i.i.d. sampling. Our novel framework systematically evaluates and controls the error propagation between the actor and critic. It offers a promising approach for analyzing other single-timescale reinforcement learning algorithms as well

    Heterogeneous Federated Learning on a Graph

    Full text link
    Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure GG exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We encode the distribution heterogeneity by parametrizing distributions on local devices with a set of distinct pp-dimensional vectors. We then propose to jointly estimate parameters of all devices under the MM-estimation framework with the fused Lasso regularization, encouraging an equal estimate of parameters on connected devices in GG. We provide a general result for our estimator depending on GG, which can be further calibrated to obtain convergence rates for various specific problem setups. Surprisingly, our estimator attains the optimal rate under certain graph fidelity condition on GG, as if we could aggregate all samples sharing the same distribution. If the graph fidelity condition is not met, we propose an edge selection procedure via multiple testing to ensure the optimality. To ease the burden of local computation, a decentralized stochastic version of ADMM is provided, with convergence rate O(T1logT)O(T^{-1}\log T) where TT denotes the number of iterations. We highlight that, our algorithm transmits only parameters along edges of GG at each iteration, without requiring a central machine, which preserves privacy. We further extend it to the case where devices are randomly inaccessible during the training process, with a similar algorithmic convergence guarantee. The computational and statistical efficiency of our method is evidenced by simulation experiments and the 2020 US presidential election data set.Comment: 61 pages, 4 figure

    Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

    Full text link
    In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning

    Cooling and Crack Suppression of Bone Material Drilling Based on Microtextured Bit Modeled on Dung Beetle

    Get PDF
    In recent years, the number of patients with orthopedic diseases such as cervical spondylosis has increased, resulting in an increase in the demand for orthopedic surgery. However, thermal necrosis and bone cracks caused by surgery severely restrict the development and progression of orthopedic surgery. For the material of cutting tool processing bone in bone surgery of drilling high temperature lead to cell death, easy to produce the problem such as crack cause secondary damage effects to restore, in this paper, a bionic drill was designed based on the micro-structure of the dung beetle’s head and back. The microstructure configuration parameters were optimized by numerical analysis, and making use of the optical fiber laser marking machine preparation of bionic bit; through drilling test, the mathematical model of drilling temperature and crack generation based on micro-structure characteristic parameters was established by infrared thermal imaging technology and acoustic emission signal technology, and the cooling mechanism and crack suppression strategy were studied. The experimental results show that when the speed is 60 m/min, the cooling effects of the bionic bit T1 and T2 are 15.31% and 19.78%, respectively, and both kinds of bits show obvious crack suppression effect. The research in this paper provides a new idea for precision and efficient machining of bone materials, and the research results will help to improve the design and manufacturing technology and theoretical research level in the field of bone drilling tools

    Lattice distortion inducing exciton splitting and coherent quantum beating in CsPbI3 perovskite quantum dots

    Full text link
    Anisotropic exchange-splitting in semiconductor quantum dots (QDs) results in bright-exciton fine-structure-splitting (FSS) important for quantum information processing. Direct measurement of FSS usually requires single/few QDs at liquid-helium temperatures, because of its sensitivity to QD size and shape, whereas measuring and controlling FSS at an ensemble-level seem to be impossible unless all the dots are made to be nearly the same. Here we report strong bright-exciton FSS up to 1.6 meV in solution-processed CsPbI3 perovskite QDs, manifested as quantum beats in ensemble-level transient absorption at liquid-nitrogen to room temperatures. The splitting is robust to QD size and shape heterogeneity, and increases with decreasing temperature, pointing towards a mechanism associated with orthorhombic distortion of perovskite lattice. Effective-mass-approximation calculations reveal an intrinsic "fine-structure gap" that agrees well with the observed FSS. This gap stems from an avoided crossing of bright-excitons confined in orthorhombically-distorted QDs that are bounded by the pseudocubic {100} family of planes

    Large-scale single-photon imaging

    Full text link
    Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 ×\times 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dB superiority on PSNR compared to the existing methods
    corecore