1,409 research outputs found

    Adversarial recovery of agent rewards from latent spaces of the limit order book

    Get PDF
    Inverse reinforcement learning has proved its ability to explain state-action trajectories of expert agents by recovering their underlying reward functions in increasingly challenging environments. Recent advances in adversarial learning have allowed extending inverse RL to applications with non-stationary environment dynamics unknown to the agents, arbitrary structures of reward functions and improved handling of the ambiguities inherent to the ill-posed nature of inverse RL. This is particularly relevant in real time applications on stochastic environments involving risk, like volatile financial markets. Moreover, recent work on simulation of complex environments enable learning algorithms to engage with real market data through simulations of its latent space representations, avoiding a costly exploration of the original environment. In this paper, we explore whether adversarial inverse RL algorithms can be adapted and trained within such latent space simulations from real market data, while maintaining their ability to recover agent rewards robust to variations in the underlying dynamics, and transfer them to new regimes of the original environment.Comment: Published as a workshop paper on NeurIPS 2019 Workshop on Robust AI in Financial Services. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

    Planning with learned ignorance-aware models

    Get PDF
    One of the goals of artificial intelligence research is to create decision-makers (i.e., agents) that improve from experience (i.e., data), collected through interaction with an environment. Models of the environment (i.e., world models) are an explicit way that agents use to represent their knowledge, enabling them to make counterfactual predictions and plans without requiring additional environment interactions. Although agents that plan with a perfect model of the environment have led to impressive demonstrations, e.g., super- human performance in board games, they are limited to problems their designer can specify a perfect model. Therefore, learning models from experience holds the promise of going beyond the scope of their designers’ reach, giving rise to a self-improving vicious circle of (i) learning a model from the past experience; (ii) planning with the learned model; and (iii) interacting with the environment, collecting new experiences. Ideally, learned models should generalise to situations beyond their training regime. Nonetheless, this is ambitious and often unrealistic when finite data is used for learning the models, leading to generally imperfect models, with which naive planning could be catastrophic in novel, out-of-training distribution situations. A more pragmatic goal is to have agents that are aware of and quantify their lack of knowledge (i.e., ignorance or epistemic uncertainty). In this thesis, we motivate and demonstrate the effectiveness of and propose novel ignorance-aware agents that plan with learned models. Naively applying powerful planning algorithms to learned models can render negative results, when the planning algorithm exploits the model imperfections in out-of-training distribution situations. This phenomenon is often termed overoptimisation and can be addressed by optimising ignorance-augmented objectives, called knowledge equivalents. We verify the validity of our ideas and methods in a number of problem settings, including learning from (i) expert demonstrations (imitation learning, §3); (ii) sub-optimal demonstrations (social learning, §4); and (iii) interacting with an environment with rewards (reinforcement learning, §5). Our empirical evidence is based on simulated autonomous driving environments, continuous control and video games from pixels and didactic small-scale grid-worlds. Throughout the thesis, we use neural networks to parameterise the (learnable) models and either use existing scalable approximate ignorance quantification deep learning methods, such as ensembles, or introduce novel planning-specific ways to quantify the agents’ ignorance. The main chapters of this thesis are based on publications (Filos et al., 2020, 2021, 2022)

    Algorithmic Trading and Reinforcement Learning: Robust methodologies for AI in finance

    Get PDF
    The application of reinforcement learning (RL) to algorithmic trading is, in many ways, a perfect match. Trading is fundamentally a problem of making decisions under uncertainty, and reinforcement learning is a family of methods for solving such problems. Indeed, many researchers have explored this space and, for the most, validated RL, its ability to find effective solutions and its importance in studying the behaviour of agents in markets. In spite of this, many of the methods available today fail to meet expectations when evaluated in realistic environments. There are a number of reasons for this: partial observability, credit assignment and non-stationary dynamics. Unlike video games, the state and action spaces are often unstructured and unbounded, which poses challenges around knowledge representation and task invariance. As a final hurdle, traders also need RL to be able to handle risk-sensitive objectives with solid human interpretation to be used reliably in practice. All of these together make for an exceptionally challenging domain that poses fascinating questions about the efficacy of RL and the techniques one can use to address these issues. This dissertation makes several contributions towards two core themes that underlie the challenges mentioned above. The first, epistemic uncertainty, covers modelling challenges such as misspecification and robustness. The second relates to aleatoric risk and safety in the presence of intrinsic randomness. These will be studied in depth, for which we summarise, below, the key findings and insights developed during the course of the PhD. The first part of the thesis investigates the use of data and historical reconstruction as a platform for learning strategies in limit order book markets. The advantages and limitations of this class of model are explored and practical insights provided. It is demonstrated that these methods make minimal assumptions about the market's dynamics, but are restricted in terms of their ability to perform counterfactual simulations. Computational aspects of reconstruction are discussed, and a highly performant library provided for running experiments. The second chapter in this part of the thesis builds upon historical reconstruction by applying value-based RL methods to market making. We first propose an intuitive and effective reward function for both risk-neutral and risk-sensitive learning and justify it through variance analysis. Eligibility traces are shown to solve the credit assignment problem observed in past work, and a comparison of different state-of-the-art algorithms (each with different assumptions) is provided. We then propose a factored state representation which incorporates market microstructure and benefits from improved stability and asymptotic performance compared with benchmark algorithms from the literature. In the second part, we explore an alternative branch of modelling techniques based on explicit stochastic processes. Here, we focus on policy gradient methods, introducing a family of likelihoods functions that are effective in trading domains and studying their properties. Four key problem domains are introduced along with their solution concepts and baseline methods. In the second chapter of part two, we use adversarial reinforcement learning to derive epistemically robust strategies. The market making model of Avellaneda and Stoikov (2008) is recast as a zero-sum, two player game between the market maker, and the market. We study the theoretical properties of a one-shot projection, and empirically evaluate the dynamics of the full stochastic game. We show that the resulting algorithms are robust to discrepancies between train and test time price/execution dynamics, and that the resulting strategies dominate performance in all cases. The final results chapter addresses the intrinsic risk of trading and portfolio management by framing the problems explicitly as constrained Markov decision processes. A downside risk measure based on lower partial moments is proposed, and a tractable linear bound derived for application in temporal-difference learning. This proxy has a natural interpretation and favourable variance properties. An extension of previous work to use natural policy gradients is then explored. The value of these two techniques is demonstrated empirically for a multi-armed bandit and two trading scenarios. The results is a practical algorithm for learning downside risk-averse strategies

    Modern applications of machine learning in quantum sciences

    Get PDF
    In these Lecture Notes, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning

    Understanding and avoiding AI failures: A practical guide

    Get PDF
    As AI technologies increase in capability and ubiquity, AI accidents are becoming more common. Based on normal accident theory, high reliability theory, and open systems theory, we create a framework for understanding the risks associated with AI applications. This framework is designed to direct attention to pertinent system properties without requiring unwieldy amounts of accuracy. In addition, we also use AI safety principles to quantify the unique risks of increased intelligence and human-like qualities in AI. Together, these two fields give a more complete picture of the risks of contemporary AI. By focusing on system properties near accidents instead of seeking a root cause of accidents, we identify where attention should be paid to safety for current generation AI systems

    Modern applications of machine learning in quantum sciences

    Full text link
    In these Lecture Notes, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning.Comment: 268 pages, 87 figures. Comments and feedback are very welcome. Figures and tex files are available at https://github.com/Shmoo137/Lecture-Note

    Causally-Inspired Generalizable Deep Learning Methods under Distribution Shifts

    Get PDF
    Deep learning methods achieved remarkable success in various areas of artificial intelligence, due to their powerful distribution-matching capabilities. However, these successes rely heavily on the i.i.d assumption, i.e., the data distributions in the training and test datasets are the same. In this way, current deep learning methods typically exhibit poor generalization under distribution shift, performing poorly on test data with a distribution that differs from the training data. This significantly hinders the application of deep learning methods to real-world scenarios, as the distribution of test data is not always the same as the training distribution in our rapidly evolving world. This thesis aims to discuss how to construct generalizable deep learning methods under distribution shifts. To achieve this, the thesis first models one prediction task as a structural causal model (SCM) which establishes the relationship between variables using directed acyclic graphs. In an SCM, some variables are easily changed across domains while others are not. However, deep learning methods often unintentionally mix invariant variables with easily changed variables, and thus deviate the learned model from the true one, resulting in the poor generalization ability under distribution shift. To remedy this issue, we propose specific algorithms to model such an invariant part of the SCM with deep learning methods, and experimentally show it is beneficial for the trained model to generalize well into different distributions of the same task. Last, we further propose to identify and model the variant information in the new test distribution so that we can fully adapt the trained deep learning model accordingly. We show the method can be extended for several practical applications, such as classification under label shift, image translation under semantics shift, robotics control in dynamics generalization and generalizing large language models into visual question-answer tasks
    • …
    corecore