9 research outputs found

    Using Reinforcement Learning to Tackle the Autonomous Recharging Problem

    Get PDF
    Navigation and object interaction are two difficult, crucial tasks for autonomous mobile vehicles. These tasks are made even more challenging when the vehicle is in an unfamiliar environment. The task of autonomous refueling in an arbitrary environment encompasses both the navigation and object interaction tasks. In this paper, we propose a reinforcement learning model and training procedure which can take the first steps toward efficiently learning to seek out and dock at a charging station using only a single on-board monocular camera. More specifically we address the task of rotating in place to find the charging station and keep it centered in the camera\u27s field of view. Our results show that using this method, the vehicle is able to successfully find and maintain focus on the charging station with a success rate of 99.4%

    Detecting wash trade in financial market using digraphs and dynamic programming

    Get PDF
    Wash trade refers to the illegal activities of traders who utilise carefully designed limit orders to manually increase the trading volumes for creating a false impression of an active market. As one of the primary formats of market abuse, wash trade can be extremely damaging to the proper functioning and integrity of capital markets. Existing work focuses on collusive clique detections based on certain assumptions of trading behaviours. Effective approaches for analysing and detecting wash trade in a real-life market have yet to be developed. This paper analyses and conceptualises the basic structures of the trading collusion in a wash trade by using a directed graph of traders. A novel method is then proposed to detect the potential wash trade activities involved in a financial instrument by first recognizing the suspiciously matched orders and then further identifying the collusions among the traders who submit such orders. Both steps are formulated as a simplified form of the Knapsack problem, which can be solved by dynamic programming approaches. The proposed approach is evaluated on seven stock datasets from NASDAQ and the London Stock Exchange. Experimental results show that the proposed approach can effectively detect all primary wash trade scenarios across the selected datasets

    Coarse and fine identification of collusive clique in financial market

    Get PDF
    Collusive transactions refer to the activity whereby traders use carefully-designed trade to illegally manipulate the market. They do this by increasing specific trading volumes, thus creating a false impression that a market is more active than it actually is. The traders involved in the collusive transactions are termed as collusive clique. The collusive clique and its activities can cause substantial damage to the market's integrity and attract much attention of the regulators around the world in recent years. Much of the current research focused on the detection based on a number of assumptions of how a normal market behaves. There is, clearly, a lack of effective decision-support tools with which to identify potential collusive clique in a real-life setting. The study in this paper examined the structures of the traders in all transactions, and proposed two approaches to detect potential collusive clique with their activities. The first approach targeted on the overall collusive trend of the traders. This is particularly useful when regulators seek a general overview of how traders gather together for their transactions. The second approach accurately detected the parcel-passing style collusive transactions on the market through analyzing the relations of the traders and transacted volumes. The proposed two approaches, on one hand, provided a complete cover for collusive transaction identifications, which can fulfill the different types of requirements of the regulation, i.e. MiFID II, on the other hand, showed a novel application of well known computational algorithms on solving real and complex financial problem. The proposed two approaches are evaluated using real financial data drawn from the NYSE and CME group. Experimental results suggested that those approaches successfully identified all primary collusive clique scenarios in all selected datasets and thus showed the effectiveness and stableness of the novel application

    Adaptive dynamic programming with eligibility traces and complexity reduction of high-dimensional systems

    Get PDF
    This dissertation investigates the application of a variety of computational intelligence techniques, particularly clustering and adaptive dynamic programming (ADP) designs especially heuristic dynamic programming (HDP) and dual heuristic programming (DHP). Moreover, a one-step temporal-difference (TD(0)) and n-step TD (TD(位)) with their gradients are utilized as learning algorithms to train and online-adapt the families of ADP. The dissertation is organized into seven papers. The first paper demonstrates the robustness of model order reduction (MOR) for simulating complex dynamical systems. Agglomerative hierarchical clustering based on performance evaluation is introduced for MOR. This method computes the reduced order denominator of the transfer function by clustering system poles in a hierarchical dendrogram. Several numerical examples of reducing techniques are taken from the literature to compare with our work. In the second paper, a HDP is combined with the Dyna algorithm for path planning. The third paper uses DHP with an eligibility trace parameter (位) to track a reference trajectory under uncertainties for a nonholonomic mobile robot by using a first-order Sugeno fuzzy neural network structure for the critic and actor networks. In the fourth and fifth papers, a stability analysis for a model-free action-dependent HDP(位) is demonstrated with batch- and online-implementation learning, respectively. The sixth work combines two different gradient prediction levels of critic networks. In this work, we provide a convergence proofs. The seventh paper develops a two-hybrid recurrent fuzzy neural network structures for both critic and actor networks. They use a novel n-step gradient temporal-difference (gradient of TD(位)) of an advanced ADP algorithm called value-gradient learning (VGL(位)), and convergence proofs are given. Furthermore, the seventh paper is the first to combine the single network adaptive critic with VGL(位). --Abstract, page iv

    Intelligent Learning Control System Design Based on Adaptive Dynamic Programming

    Get PDF
    Adaptive dynamic programming (ADP) controller is a powerful neural network based control technique that has been investigated, designed, and tested in a wide range of applications for solving optimal control problems in complex systems. The performance of ADP controller is usually obtained by long training periods because the data usage efficiency is low as it discards the samples once used. Experience replay is a powerful technique showing potential to accelerate the training process of learning and control. However, its existing design can not be directly used for model-free ADP design, because it focuses on the forward temporal difference (TD) information (e.g., state-action pair) between the current time step and the future time step, and will need a model network for future information prediction. Uniform random sampling again used for experience replay, is not an efficient technique to learn. Prioritized experience replay (PER) presents important transitions more frequently and has proven to be efficient in the learning process. In order to solve long training periods of ADP controller, the first goal of this thesis is to avoid the usage of model network or identifier of the system. Specifically, the experience tuple is designed with one step backward state-action information and the TD can be achieved by a previous time step and a current time step. The proposed approach is tested for two case studies: cart-pole and triple-link pendulum balancing tasks. The proposed approach improved the required average trial to succeed by 26.5% for cart-pole and 43% for triple-link. The second goal of this thesis is to integrate the efficient learning capability of PER into ADP. The detailed theoretical analysis is presented in order to verify the stability of the proposed control technique. The proposed approach improved the required average trial to succeed compared to traditional ADP controller by 60.56% for cart-pole and 56.89% for triple-link balancing tasks. The final goal of this thesis is to validate ADP controller in smart grid to improve current control performance of virtual synchronous machine (VSM) at sudden load changes and a single line to ground fault and reduce harmonics in shunt active filters (SAF) during different loading conditions. The ADP controller produced the fastest response time, low overshoot and in general, the best performance in comparison to the traditional current controller. In SAF, ADP controller reduced total harmonic distortion (THD) of the source current by an average of 18.41% compared to a traditional current controller alone

    Goal representation heuristic dynamic programming on maze navigation

    No full text
    Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely {\rm Sarsa}(\lambda) and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach. 漏 2012 IEEE

    Goal Representation Heuristic Dynamic Programming on Maze Navigation

    No full text

    IEEE Transactions On Neural Networks And Learning Systems : Vol. 24, No. 12, December 2013

    No full text
    Canonical Correlation Analysis on Data With Censoring and Error Information - J. Sun and S. Keates Highly Accurate Moving Object Detection in Variable Bit Rate Video-Based Traffic Monitoring Systems - S. -C. Huang and B. -H. Chen Recurrent Neural Collective Classification - D. D. Monner and J. A. Reggia Online Selective Kernel-Based Temporal Difference Learning - X. Chen, Y. Gao, and R. Wang Stability and Synchronization of Discrete-Time Neural Network With Switching Parameters, and Time-Varying Delays - L. Wu, Z. Feng, and J. Lam Artificial Endocrine Controlller for Power Management in Robotic Systems C. Sauze and M. Neal Operator Control of Interneural Computing Machines - M. -H. Shih and F. -S. Tsai Multiple Graph Label Propagation by Sparse Integration - M. Karasuyama and H. Mamitsuka Universal Blind Image Quality Assessment Metrics Via Natural Scene Statistics and Multiple Kernel Learning - X. Gao, F. Gao, D. Tao, and X. Li H State Estimation for Complex Networks With Uncertain Inner Coupling and Incomplete Measurements - B. Shen, Z. wang, D. Ding, and H. Shu Goal Representation Heuristic Dynamic Programming on Maze Navigation - Z. Ni, H. He, J. Wen, and X. Xu Accelerated Canonical Polyadic Decomposition Using Mode Reduction - G. Zhou, A. Cichocki, and S. Xie Hardware Friendly Probabilistic spiking Neural Network With Long-Term and Short - Term Plasticity - H. -Y. Hsieh and K. -T. Tang Neural Network Architecture for Cognitive Navigation in Dynamic Environments - J. A. Villacorta - Atienza and V. A. Makarov An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time - M. Fairbank, E. Alonso, and D. Prokhorov Semisupervised Multitask Learning With Gaussian Processes - G Skolidis and G. Sanguinetti BRIEF PAPERS Nonlinear Projection Trick in Kernel Methods : An Alternative to the Kernel Trick - N. Kwak ANNOUNCEMENTS IEEE WCCI 2014 Etc
    corecore