172,004 research outputs found

    Efficient Reinforcement Learning Using Recursive Least-Squares Methods

    Full text link
    The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(lambda) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(lambda) can be viewed as the extension of RLS-TD(0) from lambda=0 to general lambda within interval [0,1], so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(lambda) are proved for ergodic Markov chains. Compared to the existing LS-TD(lambda) algorithm, RLS-TD(lambda) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(lambda) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(lambda) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(lambda). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(lambda) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods

    Harnessing machine learning for fiber-induced nonlinearity mitigation in long-haul coherent optical OFDM

    Get PDF
    © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).Coherent optical orthogonal frequency division multiplexing (CO-OFDM) has attracted a lot of interest in optical fiber communications due to its simplified digital signal processing (DSP) units, high spectral-efficiency, flexibility, and tolerance to linear impairments. However, CO-OFDM’s high peak-to-average power ratio imposes high vulnerability to fiber-induced non-linearities. DSP-based machine learning has been considered as a promising approach for fiber non-linearity compensation without sacrificing computational complexity. In this paper, we review the existing machine learning approaches for CO-OFDM in a common framework and review the progress in this area with a focus on practical aspects and comparison with benchmark DSP solutions.Peer reviewe

    Temporal phase unwrapping using deep learning

    Full text link
    The multi-frequency temporal phase unwrapping (MF-TPU) method, as a classical phase unwrapping algorithm for fringe projection profilometry (FPP), is capable of eliminating the phase ambiguities even in the presence of surface discontinuities or spatially isolated objects. For the simplest and most efficient case, two sets of 3-step phase-shifting fringe patterns are used: the high-frequency one is for 3D measurement and the unit-frequency one is for unwrapping the phase obtained from the high-frequency pattern set. The final measurement precision or sensitivity is determined by the number of fringes used within the high-frequency pattern, under the precondition that the phase can be successfully unwrapped without triggering the fringe order error. Consequently, in order to guarantee a reasonable unwrapping success rate, the fringe number (or period number) of the high-frequency fringe patterns is generally restricted to about 16, resulting in limited measurement accuracy. On the other hand, using additional intermediate sets of fringe patterns can unwrap the phase with higher frequency, but at the expense of a prolonged pattern sequence. Inspired by recent successes of deep learning techniques for computer vision and computational imaging, in this work, we report that the deep neural networks can learn to perform TPU after appropriate training, as called deep-learning based temporal phase unwrapping (DL-TPU), which can substantially improve the unwrapping reliability compared with MF-TPU even in the presence of different types of error sources, e.g., intensity noise, low fringe modulation, and projector nonlinearity. We further experimentally demonstrate for the first time, to our knowledge, that the high-frequency phase obtained from 64-period 3-step phase-shifting fringe patterns can be directly and reliably unwrapped from one unit-frequency phase using DL-TPU
    • …
    corecore