35 research outputs found
Adaptive Critic Designs
We discuss a variety of adaptive critic designs (ACDs) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: heuristic dynamic programming, dual heuristic programming, and globalized dual heuristic programming (GDHP). The main emphasis is on DHP and GDHP as advanced ACDs. We suggest two new modifications of the original GDHP design that are currently the only working implementations of GDHP. They promise to be useful for many engineering applications in the areas of optimization and optimal control. Based on one of these modifications, we present a unified approach to all ACDs. This leads to a generalized training procedure for ACD
Approximation with Random Bases: Pro et Contra
In this work we discuss the problem of selecting suitable approximators from
families of parameterized elementary functions that are known to be dense in a
Hilbert space of functions. We consider and analyze published procedures, both
randomized and deterministic, for selecting elements from these families that
have been shown to ensure the rate of convergence in norm of order
, where is the number of elements. We show that both randomized and
deterministic procedures are successful if additional information about the
families of functions to be approximated is provided. In the absence of such
additional information one may observe exponential growth of the number of
terms needed to approximate the function and/or extreme sensitivity of the
outcome of the approximation to parameters. Implications of our analysis for
applications of neural networks in modeling and control are illustrated with
examples.Comment: arXiv admin note: text overlap with arXiv:0905.067
Conservative Thirty Calendar Day Stock Prediction Using a Probabilistic Neural Network
We describe a system that predicts significant short-term price movement in a single stock utilizing conservative strategies. We use preprocessing techniques, then train a probabilistic neural network to predict only price gains large enough to create a significant profit opportunity. Our primary objective is to limit false predictions (known in the pattern recognition literature as false alarms). False alarms are more significant than missed opportunities, because false alarms acted upon lead to losses. We can achieve false alarm rates as low as 5.7% with the correct system design and parameterization
Training Winner-Take-All Simultaneous Recurrent Neural Networks
The winner-take-all (WTA) network is useful in database management, very large scale integration (VLSI) design, and digital processing. The synthesis procedure of WTA on single-layer fully connected architecture with sigmoid transfer function is still not fully explored. We discuss the use of simultaneous recurrent networks (SRNs) trained by Kalman filter algorithms for the task of finding the maximum among N numbers. The simulation demonstrates the effectiveness of our training approach under conditions of a shared-weight SRN architecture. A more general SRN also succeeds in solving a real classification application on car engine data
Adaptive Critic Design in Learning to Play Game of Go
This paper examines the performance of an HDP-type adaptive critic design (ACD) of the game Go. The game Go is an ideal problem domain for exploring machine learning; it has simple rules but requires complex strategies to play well. All current commercial Go programs are knowledge based implementations; they utilize input feature and pattern matching along with minimax type search techniques. But the extremely high branching factor puts a limit on their capabilities, and they are very weak compared to the relative strengths of other game programs like chess. In this paper, the Go-playing ACD consists of a critic network and an action network. The HDP type critic network learns to predict the cumulative utility function of the current board position from training games, and, the action network chooses a next move which maximizes critics next step cost-to-go. After about 6000 different training games against a public domain program, WALLY, the network (playing WHITE) began to win in some of the games and showed slow but steady improvements on test game
Neurocontroller Alternatives for Fuzzy Ball-and-Beam Systems with Nonuniform Nonlinear Friction
The ball-and-beam problem is a benchmark for testing control algorithms. Zadeh proposed (1994) a twist to the problem, which, he suggested, would require a fuzzy logic controller. This experiment uses a beam, partially covered with a sticky substance, increasing the difficulty of predicting the ball\u27s motion. We complicated this problem even more by not using any information concerning the ball\u27s velocity. Although it is common to use the first differences of the ball\u27s consecutive positions as a measure of velocity and explicit input to the controller, we preferred to exploit recurrent neural networks, inputting only consecutive positions instead. We have used truncated backpropagation through time with the node-decoupled extended Kalman filter (NDEKF) algorithm to update the weights in the networks. Our best neurocontroller uses a form of approximate dynamic programming called an adaptive critic design. A hierarchy of such designs exists. Our system uses dual heuristic programming (DHP), an upper-level design. To our best knowledge, our results are the first use of DHP to control a physical system. It is also the first system we know of to respond to Zadeh\u27s challenge. We do not claim this neural network control algorithm is the best approach to this problem, nor do we claim it is better than a fuzzy controller. It is instead a contribution to the scientific dialogue about the boundary between the two overlapping disciplines
Comparative Study of Stock Trend Prediction using Time Delay, Recurrent and Probabilistic Neural Networks
Three networks are compared for low false alarm stock trend predictions. Short-term trends, particularly attractive for neural network analysis, can be used profitably in scenarios such as option trading, but only with significant risk. Therefore, we focus on limiting false alarms, which improves the risk/reward ratio by preventing losses. To predict stock trends, we exploit time delay, recurrent, and probabilistic neural networks (TDNN, RNN, and PNN, respectively), utilizing conjugate gradient and multistream extended Kalman filter training for TDNN and RNN. We also discuss different predictability analysis techniques and perform an analysis of predictability based on a history of daily closing price. Our results indicate that all the networks are feasible, the primary preference being one of convenienc
Advanced Adaptive Critic Designs
We present a unified approach to a family of Adaptive Critic Designs (ACDs). ACDs approximate dynamic programming for optimal control and decision making in noisy, nonlinear, or nonstationary environments. This family consists of Heuristic Dynamic Programming (HDP), Dual Heuristic Programming (DHP), and Globalized Dual Heuristic Programming (GDHP), as well as their Action-Dependent forms (the prefix AD denotes these)[1]. The most powerful of these designs reported previously is GDHP [2,3]. After pointing out problems of the simple ACDs, we describe advanced ACDs and introduce ADGDHP. We also propose a general training procedure for ACDs and discuss some important research issues
Convergence Of Critic-Based Training
This paper discusses convergence issues when training adaptive critic designs (ACD) to control dynamic systems expressed as Markov sequences. We critically review two published convergence results of critic-based training and propose to shift emphasis towards more practically valuable convergence proofs. We show a possible way to prove convergence of ACD training. 1. INTRODUCTION We study ACD with neural networks in the domain of Markov sequences. A family of ACD exists, and it is extensively described in [1, 2]. Most significant difference among ACD lies in a type of critic they use. Simplest ACD, e.g. Q-learning, employ J critic or function that evaluates long-term performance of the closed-loop system. (J is a common designator for the cost-to-go in dynamic programming, hence the critic's notation.) In contrast, advanced ACD use derivative critics, i.e. critics outputting derivatives of J with respect to states of the system. Convergence of ACD training is both important and multi..