5 research outputs found

    Cooperative Active Learning based Dual Control for Exploration and Exploitation in Autonomous Search

    Get PDF
    In this paper, a multi-estimator based computationally efficient algorithm is developed for autonomous search in an unknown environment with an unknown source. Different from the existing approaches that require massive computational power to support nonlinear Bayesian estimation and complex decision-making process, an efficient cooperative active learning based dual control for exploration and exploitation (COAL-DCEE) is developed for source estimation and path planning. Multiple cooperative estimators are deployed for environment learning process, which is helpful to improving the search performance and robustness against noisy measurements. The number of estimators used in COAL-DCEE is much smaller than that of particles required for Bayesian estimation in information-theoretic approaches. Consequently, the computational load is significantly reduced. As an important feature of this study, the convergence and performance of COAL-DCEE are established in relation to the characteristics of sensor noises and turbulence disturbances. Numerical and experimental studies have been carried out to verify the effectiveness of the proposed framework. Compared with existing approaches, COAL-DCEE not only provides convergence guarantee, but also yields comparable search performance using much less computational power

    Dual Control of Exploration and Exploitation for Auto-Optimisation Control with Active Learning

    Get PDF
    The quest for optimal operation in environments with unknowns and uncertainties is highly desirable but critically challenging across numerous fields. This paper develops a dual control framework for exploration and exploitation (DCEE) to solve an auto-optimisation problem in such complex settings. In general, there is a fundamental conflict between tracking an unknown optimal operational condition and parameter identification. The DCEE framework stands out by eliminating the need for additional perturbation signals, a common requirement in existing adaptive control methods. Instead, it inherently incorporates an exploration mechanism, actively probing the uncertain environment to diminish belief uncertainty. An ensemble based multi-estimator approach is developed to learn the environmental parameters and in the meanwhile quantify the estimation uncertainty in real time. The control action is devised with dual effects, which not only minimises the tracking error between the current state and the believed unknown optimal operational condition but also reduces belief uncertainty by proactively exploring the environment. Formal properties of the proposed DCEE framework like convergence are established. A numerical example is used to validate the effectiveness of the proposed DCEE. Simulation results for maximum power point tracking are provided to further demonstrate the potential of this new framework in real world applications

    Off-Policy Temporal Difference Learning For Robotics And Autonomous Systems

    Get PDF
    Reinforcement learning (RL) is a rapidly advancing field with implications in autonomous vehicles, medicine, finance, along with several other applications. Particularly, off-policy temporal difference (TD) learning, a specific type of RL technique, has been widely used in a variety of autonomous tasks. However, there remain significant challenges that must be overcome before it can be successfully applied to various real-world applications. In this thesis, we specifically address several major challenges in off-policy TD learning. In the first part of the thesis, we introduce an efficient method of learning complex stand-up motion of humanoid robots by Q-learning. Standing up after falling is an essential ability for humanoid robots yet it is difficult to learn flexible stand-up motions for various fallen positions due to the complexity of the task. We reduce sample complexity of learning by applying a clustering method and utilizing the bilateral symmetric feature of humanoid robots. The learned policy is demonstrated in both simulation and on a physical robot. The greedy update of Q-learning, however, often causes overoptimism and instability. In the second part of the thesis, we propose a novel Bayesian approach to Q-learning, called ADFQ, which improves the greedy update issues by providing a principled way of updating Q-values based on uncertainty of Q-belief distributions. The algorithm converges to Q-learning as the uncertainty approaches zero, and its efficient computational complexity enables the algorithm to be extended with a neural network. Both ADFQ and its neural network extension outperform their comparing algorithms by improving the estimation bias and converging faster to optimal Q-values. In the last part of the thesis, we apply off-policy TD methods to solve the active information acquisition problem where an autonomous agent is tasked with acquiring information about targets of interests. Off-policy TD learning provides solutions for classical challenges in this problem -- system model dependence and the difficulty of computing information-theoretic cost functions for a long planning horizon. In particular, we introduce a method of learning a unified policy for in-sight tracking, navigation, and exploration. The policy shows robust behavior for tracking agile and anomalous targets with a partially known target model
    corecore