3 research outputs found

    Apprentissage Intelligent des Robots Mobiles dans la Navigation Autonome

    Get PDF
    Modern robots are designed for assisting or replacing human beings to perform complicated planning and control operations, and the capability of autonomous navigation in a dynamic environment is an essential requirement for mobile robots. In order to alleviate the tedious task of manually programming a robot, this dissertation contributes to the design of intelligent robot control to endow mobile robots with a learning ability in autonomous navigation tasks. First, we consider the robot learning from expert demonstrations. A neural network framework is proposed as the inference mechanism to learn a policy offline from the dataset extracted from experts. Then we are interested in the robot self-learning ability without expert demonstrations. We apply reinforcement learning techniques to acquire and optimize a control strategy during the interaction process between the learning robot and the unknown environment. A neural network is also incorporated to allow a fast generalization, and it helps the learning to converge in a number of episodes that is greatly smaller than the traditional methods. Finally, we study the robot learning of the potential rewards underneath the states from optimal or suboptimal expert demonstrations. We propose an algorithm based on inverse reinforcement learning. A nonlinear policy representation is designed and the max-margin method is applied to refine the rewards and generate an optimal control policy. The three proposed methods have been successfully implemented on the autonomous navigation tasks for mobile robots in unknown and dynamic environments.Les robots modernes sont appelés à effectuer des opérations ou tâches complexes et la capacité de navigation autonome dans un environnement dynamique est un besoin essentiel pour les robots mobiles. Dans l’objectif de soulager de la fastidieuse tâche de préprogrammer un robot manuellement, cette thèse contribue à la conception de commande intelligente afin de réaliser l’apprentissage des robots mobiles durant la navigation autonome. D’abord, nous considérons l’apprentissage des robots via des démonstrations d’experts. Nous proposons d’utiliser un réseau de neurones pour apprendre hors-ligne une politique de commande à partir de données utiles extraites d’expertises. Ensuite, nous nous intéressons à l’apprentissage sans démonstrations d’experts. Nous utilisons l’apprentissage par renforcement afin que le robot puisse optimiser une stratégie de commande pendant le processus d’interaction avec l’environnement inconnu. Un réseau de neurones est également incorporé et une généralisation rapide permet à l’apprentissage de converger en un certain nombre d’épisodes inférieur à la littérature. Enfin, nous étudions l’apprentissage par fonction de récompenses potentielles compte rendu des démonstrations d’experts optimaux ou non-optimaux. Nous proposons un algorithme basé sur l’apprentissage inverse par renforcement. Une représentation non-linéaire de la politique est désignée et la méthode du max-margin est appliquée permettant d’affiner les récompenses et de générer la politique de commande. Les trois méthodes proposées sont évaluées sur des robots mobiles afin de leurs permettre d’acquérir les compétences de navigation autonome dans des environnements dynamiques et inconnu

    An online adaptive learning algorithm for optimal trade execution in high-frequency markets

    Get PDF
    A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in the Faculty of Science, School of Computer Science and Applied Mathematics University of the Witwatersrand. October 2016.Automated algorithmic trade execution is a central problem in modern financial markets, however finding and navigating optimal trajectories in this system is a non-trivial task. Many authors have developed exact analytical solutions by making simplifying assumptions regarding governing dynamics, however for practical feasibility and robustness, a more dynamic approach is needed to capture the spatial and temporal system complexity and adapt as intraday regimes change. This thesis aims to consolidate four key ideas: 1) the financial market as a complex adaptive system, where purposeful agents with varying system visibility collectively and simultaneously create and perceive their environment as they interact with it; 2) spin glass models as a tractable formalism to model phenomena in this complex system; 3) the multivariate Hawkes process as a candidate governing process for limit order book events; and 4) reinforcement learning as a framework for online, adaptive learning. Combined with the data and computational challenges of developing an efficient, machine-scale trading algorithm, we present a feasible scheme which systematically encodes these ideas. We first determine the efficacy of the proposed learning framework, under the conjecture of approximate Markovian dynamics in the equity market. We find that a simple lookup table Q-learning algorithm, with discrete state attributes and discrete actions, is able to improve post-trade implementation shortfall by adapting a typical static arrival-price volume trajectory with respect to prevailing market microstructure features streaming from the limit order book. To enumerate a scale-specific state space whilst avoiding the curse of dimensionality, we propose a novel approach to detect the intraday temporal financial market state at each decision point in the Q-learning algorithm, inspired by the complex adaptive system paradigm. A physical analogy to the ferromagnetic Potts model at thermal equilibrium is used to develop a high-speed maximum likelihood clustering algorithm, appropriate for measuring critical or near-critical temporal states in the financial system. State features are studied to extract time-scale-specific state signature vectors, which serve as low-dimensional state descriptors and enable online state detection. To assess the impact of agent interactions on the system, a multivariate Hawkes process is used to measure the resiliency of the limit order book with respect to liquidity-demand events of varying size. By studying the branching ratios associated with key quote replenishment intensities following trades, we ensure that the limit order book is expected to be resilient with respect to the maximum permissible trade executed by the agent. Finally we present a feasible scheme for unsupervised state discovery, state detection and online learning for high-frequency quantitative trading agents faced with a multifeatured, asynchronous market data feed. We provide a technique for enumerating the state space at the scale at which the agent interacts with the system, incorporating the effects of a live trading agent on limit order book dynamics into the market data feed, and hence the perceived state evolution.LG201

    MDPs with Non-Deterministic Policies

    No full text
    Markov Decision Processes (MDPs) have been extensively studied and used in the context of planning and decision-making, and many methods exist to find the optimal policy for problems modelled as MDPs. Although finding the optimal policy is sufficient in many domains, in certain applications such as decision support systems where the policy is executed by a human (rather than a machine), finding all possible near-optimal policies might be useful as it provides more flexibility to the person executing the policy. In this paper we introduce the new concept of non-deterministic MDP policies, and address the question of finding near-optimal non-deterministic policies. We propose two solutions to this problem, one based on a Mixed Integer Program and the other one based on a search algorithm. We include experimental results obtained from applying this framework to optimize treatment choices in the context of a medical decision support system.