3 research outputs found
Apprentissage Intelligent des Robots Mobiles dans la Navigation Autonome
Modern robots are designed for assisting or replacing human beings to perform complicated planning and control operations, and the capability of autonomous navigation in a dynamic environment is an essential requirement for mobile robots. In order to alleviate the tedious task of manually programming a robot, this dissertation contributes to the design of intelligent robot control to endow mobile robots with a learning ability in autonomous navigation tasks. First, we consider the robot learning from expert demonstrations. A neural network framework is proposed as the inference mechanism to learn a policy offline from the dataset extracted from experts. Then we are interested in the robot self-learning ability without expert demonstrations. We apply reinforcement learning techniques to acquire and optimize a control strategy during the interaction process between the learning robot and the unknown environment. A neural network is also incorporated to allow a fast generalization, and it helps the learning to converge in a number of episodes that is greatly smaller than the traditional methods. Finally, we study the robot learning of the potential rewards underneath the states from optimal or suboptimal expert demonstrations. We propose an algorithm based on inverse reinforcement learning. A nonlinear policy representation is designed and the max-margin method is applied to refine the rewards and generate an optimal control policy. The three proposed methods have been successfully implemented on the autonomous navigation tasks for mobile robots in unknown and dynamic environments.Les robots modernes sont appelés à effectuer des opérations ou tâches complexes et la capacité de navigation autonome dans un environnement dynamique est un besoin essentiel pour les robots mobiles. Dans l’objectif de soulager de la fastidieuse tâche de préprogrammer un robot manuellement, cette thèse contribue à la conception de commande intelligente afin de réaliser l’apprentissage des robots mobiles durant la navigation autonome. D’abord, nous considérons l’apprentissage des robots via des démonstrations d’experts. Nous proposons d’utiliser un réseau de neurones pour apprendre hors-ligne une politique de commande à partir de données utiles extraites d’expertises. Ensuite, nous nous intéressons à l’apprentissage sans démonstrations d’experts. Nous utilisons l’apprentissage par renforcement afin que le robot puisse optimiser une stratégie de commande pendant le processus d’interaction avec l’environnement inconnu. Un réseau de neurones est également incorporé et une généralisation rapide permet à l’apprentissage de converger en un certain nombre d’épisodes inférieur à la littérature. Enfin, nous étudions l’apprentissage par fonction de récompenses potentielles compte rendu des démonstrations d’experts optimaux ou non-optimaux. Nous proposons un algorithme basé sur l’apprentissage inverse par renforcement. Une représentation non-linéaire de la politique est désignée et la méthode du max-margin est appliquée permettant d’affiner les récompenses et de générer la politique de commande. Les trois méthodes proposées sont évaluées sur des robots mobiles afin de leurs permettre d’acquérir les compétences de navigation autonome dans des environnements dynamiques et inconnu
An online adaptive learning algorithm for optimal trade execution in high-frequency markets
A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy
in the Faculty of Science, School of Computer Science and Applied Mathematics
University of the Witwatersrand. October 2016.Automated algorithmic trade execution is a central problem in modern financial markets,
however finding and navigating optimal trajectories in this system is a non-trivial
task. Many authors have developed exact analytical solutions by making simplifying
assumptions regarding governing dynamics, however for practical feasibility and robustness,
a more dynamic approach is needed to capture the spatial and temporal system
complexity and adapt as intraday regimes change.
This thesis aims to consolidate four key ideas: 1) the financial market as a complex
adaptive system, where purposeful agents with varying system visibility collectively and
simultaneously create and perceive their environment as they interact with it; 2) spin
glass models as a tractable formalism to model phenomena in this complex system; 3) the
multivariate Hawkes process as a candidate governing process for limit order book events;
and 4) reinforcement learning as a framework for online, adaptive learning. Combined
with the data and computational challenges of developing an efficient, machine-scale
trading algorithm, we present a feasible scheme which systematically encodes these ideas.
We first determine the efficacy of the proposed learning framework, under the conjecture
of approximate Markovian dynamics in the equity market. We find that a simple lookup
table Q-learning algorithm, with discrete state attributes and discrete actions, is able
to improve post-trade implementation shortfall by adapting a typical static arrival-price
volume trajectory with respect to prevailing market microstructure features streaming
from the limit order book.
To enumerate a scale-specific state space whilst avoiding the curse of dimensionality, we
propose a novel approach to detect the intraday temporal financial market state at each
decision point in the Q-learning algorithm, inspired by the complex adaptive system
paradigm. A physical analogy to the ferromagnetic Potts model at thermal equilibrium
is used to develop a high-speed maximum likelihood clustering algorithm, appropriate
for measuring critical or near-critical temporal states in the financial system. State
features are studied to extract time-scale-specific state signature vectors, which serve as
low-dimensional state descriptors and enable online state detection.
To assess the impact of agent interactions on the system, a multivariate Hawkes process is
used to measure the resiliency of the limit order book with respect to liquidity-demand
events of varying size. By studying the branching ratios associated with key quote
replenishment intensities following trades, we ensure that the limit order book is expected
to be resilient with respect to the maximum permissible trade executed by the agent.
Finally we present a feasible scheme for unsupervised state discovery, state detection
and online learning for high-frequency quantitative trading agents faced with a multifeatured,
asynchronous market data feed. We provide a technique for enumerating the
state space at the scale at which the agent interacts with the system, incorporating the
effects of a live trading agent on limit order book dynamics into the market data feed,
and hence the perceived state evolution.LG201
MDPs with Non-Deterministic Policies
Markov Decision Processes (MDPs) have been extensively studied and used in the context of planning and decision-making, and many methods exist to find the optimal policy for problems modelled as MDPs. Although finding the optimal policy is sufficient in many domains, in certain applications such as decision support systems where the policy is executed by a human (rather than a machine), finding all possible near-optimal policies might be useful as it provides more flexibility to the person executing the policy. In this paper we introduce the new concept of non-deterministic MDP policies, and address the question of finding near-optimal non-deterministic policies. We propose two solutions to this problem, one based on a Mixed Integer Program and the other one based on a search algorithm. We include experimental results obtained from applying this framework to optimize treatment choices in the context of a medical decision support system.