172,004 research outputs found
Efficient Reinforcement Learning Using Recursive Least-Squares Methods
The recursive least-squares (RLS) algorithm is one of the most well-known
algorithms used in adaptive filtering, system identification and adaptive
control. Its popularity is mainly due to its fast convergence speed, which is
considered to be optimal in practice. In this paper, RLS methods are used to
solve reinforcement learning problems, where two new reinforcement learning
algorithms using linear value function approximators are proposed and analyzed.
The two algorithms are called RLS-TD(lambda) and Fast-AHC (Fast Adaptive
Heuristic Critic), respectively. RLS-TD(lambda) can be viewed as the extension
of RLS-TD(0) from lambda=0 to general lambda within interval [0,1], so it is a
multi-step temporal-difference (TD) learning algorithm using RLS methods. The
convergence with probability one and the limit of convergence of RLS-TD(lambda)
are proved for ergodic Markov chains. Compared to the existing LS-TD(lambda)
algorithm, RLS-TD(lambda) has advantages in computation and is more suitable
for online learning. The effectiveness of RLS-TD(lambda) is analyzed and
verified by learning prediction experiments of Markov chains with a wide range
of parameter settings. The Fast-AHC algorithm is derived by applying the
proposed RLS-TD(lambda) algorithm in the critic network of the adaptive
heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use
of RLS methods to improve the learning-prediction efficiency in the critic.
Learning control experiments of the cart-pole balancing and the acrobot
swing-up problems are conducted to compare the data efficiency of Fast-AHC with
conventional AHC. From the experimental results, it is shown that the data
efficiency of learning control can also be improved by using RLS methods in the
learning-prediction process of the critic. The performance of Fast-AHC is also
compared with that of the AHC method using LS-TD(lambda). Furthermore, it is
demonstrated in the experiments that different initial values of the variance
matrix in RLS-TD(lambda) are required to get better performance not only in
learning prediction but also in learning control. The experimental results are
analyzed based on the existing theoretical work on the transient phase of
forgetting factor RLS methods
Harnessing machine learning for fiber-induced nonlinearity mitigation in long-haul coherent optical OFDM
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).Coherent optical orthogonal frequency division multiplexing (CO-OFDM) has attracted a lot of interest in optical fiber communications due to its simplified digital signal processing (DSP) units, high spectral-efficiency, flexibility, and tolerance to linear impairments. However, CO-OFDM’s high peak-to-average power ratio imposes high vulnerability to fiber-induced non-linearities. DSP-based machine learning has been considered as a promising approach for fiber non-linearity compensation without sacrificing computational complexity. In this paper, we review the existing machine learning approaches for CO-OFDM in a common framework and review the progress in this area with a focus on practical aspects and comparison with benchmark DSP solutions.Peer reviewe
Temporal phase unwrapping using deep learning
The multi-frequency temporal phase unwrapping (MF-TPU) method, as a classical
phase unwrapping algorithm for fringe projection profilometry (FPP), is capable
of eliminating the phase ambiguities even in the presence of surface
discontinuities or spatially isolated objects. For the simplest and most
efficient case, two sets of 3-step phase-shifting fringe patterns are used: the
high-frequency one is for 3D measurement and the unit-frequency one is for
unwrapping the phase obtained from the high-frequency pattern set. The final
measurement precision or sensitivity is determined by the number of fringes
used within the high-frequency pattern, under the precondition that the phase
can be successfully unwrapped without triggering the fringe order error.
Consequently, in order to guarantee a reasonable unwrapping success rate, the
fringe number (or period number) of the high-frequency fringe patterns is
generally restricted to about 16, resulting in limited measurement accuracy. On
the other hand, using additional intermediate sets of fringe patterns can
unwrap the phase with higher frequency, but at the expense of a prolonged
pattern sequence. Inspired by recent successes of deep learning techniques for
computer vision and computational imaging, in this work, we report that the
deep neural networks can learn to perform TPU after appropriate training, as
called deep-learning based temporal phase unwrapping (DL-TPU), which can
substantially improve the unwrapping reliability compared with MF-TPU even in
the presence of different types of error sources, e.g., intensity noise, low
fringe modulation, and projector nonlinearity. We further experimentally
demonstrate for the first time, to our knowledge, that the high-frequency phase
obtained from 64-period 3-step phase-shifting fringe patterns can be directly
and reliably unwrapped from one unit-frequency phase using DL-TPU
- …