496 research outputs found

    History of the Class of \u2774

    Get PDF
    Class history written by Thomas Spooner Jr. which would have been presented at a Class Day or Commencement for the graduating Class of 1874

    The Sound of Sweetness on the Grand Union Canal

    Get PDF
    On 11th March 2015, the gallant Tom Spooner and brave Simon King struck a course northwest on the Grand Union Canal, exploiting the terrain and the psycho-social boundaries imposed upon them by the city. They have left this message in the hope that others may come and join them in a great urban gathering at the Twyford aqueduct to celebrate their autonomy. - Message in a bottl

    Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

    Full text link
    Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule's coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt

    Algorithmic Trading and Reinforcement Learning: Robust methodologies for AI in finance

    Get PDF
    The application of reinforcement learning (RL) to algorithmic trading is, in many ways, a perfect match. Trading is fundamentally a problem of making decisions under uncertainty, and reinforcement learning is a family of methods for solving such problems. Indeed, many researchers have explored this space and, for the most, validated RL, its ability to find effective solutions and its importance in studying the behaviour of agents in markets. In spite of this, many of the methods available today fail to meet expectations when evaluated in realistic environments. There are a number of reasons for this: partial observability, credit assignment and non-stationary dynamics. Unlike video games, the state and action spaces are often unstructured and unbounded, which poses challenges around knowledge representation and task invariance. As a final hurdle, traders also need RL to be able to handle risk-sensitive objectives with solid human interpretation to be used reliably in practice. All of these together make for an exceptionally challenging domain that poses fascinating questions about the efficacy of RL and the techniques one can use to address these issues. This dissertation makes several contributions towards two core themes that underlie the challenges mentioned above. The first, epistemic uncertainty, covers modelling challenges such as misspecification and robustness. The second relates to aleatoric risk and safety in the presence of intrinsic randomness. These will be studied in depth, for which we summarise, below, the key findings and insights developed during the course of the PhD. The first part of the thesis investigates the use of data and historical reconstruction as a platform for learning strategies in limit order book markets. The advantages and limitations of this class of model are explored and practical insights provided. It is demonstrated that these methods make minimal assumptions about the market's dynamics, but are restricted in terms of their ability to perform counterfactual simulations. Computational aspects of reconstruction are discussed, and a highly performant library provided for running experiments. The second chapter in this part of the thesis builds upon historical reconstruction by applying value-based RL methods to market making. We first propose an intuitive and effective reward function for both risk-neutral and risk-sensitive learning and justify it through variance analysis. Eligibility traces are shown to solve the credit assignment problem observed in past work, and a comparison of different state-of-the-art algorithms (each with different assumptions) is provided. We then propose a factored state representation which incorporates market microstructure and benefits from improved stability and asymptotic performance compared with benchmark algorithms from the literature. In the second part, we explore an alternative branch of modelling techniques based on explicit stochastic processes. Here, we focus on policy gradient methods, introducing a family of likelihoods functions that are effective in trading domains and studying their properties. Four key problem domains are introduced along with their solution concepts and baseline methods. In the second chapter of part two, we use adversarial reinforcement learning to derive epistemically robust strategies. The market making model of Avellaneda and Stoikov (2008) is recast as a zero-sum, two player game between the market maker, and the market. We study the theoretical properties of a one-shot projection, and empirically evaluate the dynamics of the full stochastic game. We show that the resulting algorithms are robust to discrepancies between train and test time price/execution dynamics, and that the resulting strategies dominate performance in all cases. The final results chapter addresses the intrinsic risk of trading and portfolio management by framing the problems explicitly as constrained Markov decision processes. A downside risk measure based on lower partial moments is proposed, and a tractable linear bound derived for application in temporal-difference learning. This proxy has a natural interpretation and favourable variance properties. An extension of previous work to use natural policy gradients is then explored. The value of these two techniques is demonstrated empirically for a multi-armed bandit and two trading scenarios. The results is a practical algorithm for learning downside risk-averse strategies

    Parameterized temporal exploration problems

    Get PDF
    In this paper we study the fixed-parameter tractability of the problem of deciding whether a given temporal graph G admits a temporal walk that visits all vertices (temporal exploration) or, in some problem variants, a certain subset of the vertices. Formally, a temporal graph is a sequence G = hG1, . . . , GLi of graphs with V (Gt) = V (G) and E(Gt) ⊆ E(G) for all t ∈ [L] and some underlying graph G, and a temporal walk is a timerespecting sequence of edge-traversals. We consider both the strict variant, in which edges must be traversed in strictly increasing timesteps, and the non-strict variant, in which an arbitrary number of edges can be traversed in each timestep. For both variants, we give FPT algorithms for the problem of finding a temporal walk that visits a given set X of vertices, parameterized by |X|, and for the problem of finding a temporal walk that visits at least k distinct vertices in V (G), parameterized by k. We also show W[2]-hardness for a set version of the temporal exploration problem for both variants. For the non-strict variant, we give an FPT algorithm for the temporal exploration problem parameterized by the lifetime of the input graph, and we show that the temporal exploration problem can be solved in polynomial time if the graph in each timestep has at most two connected components

    Bayesian optimisation of restriction zones for bluetongue control.

    Get PDF
    We investigate the restriction of animal movements as a method to control the spread of bluetongue, an infectious disease of livestock that is becoming increasingly prevalent due to the onset of climate change. We derive control policies for the UK that minimise the number of infected farms during an outbreak using Bayesian optimisation and a simulation-based model of BT. Two cases are presented: first, where the region of introduction is randomly selected from England and Wales to find a generalised strategy. This "national" model is shown to be just as effective at subduing the spread of bluetongue as the current strategy of the UK government. Our proposed controls are simpler to implement, affect fewer farms in the process and, in so doing, minimise the potential economic implications. Second, we consider policies that are tailored to the specific region in which the first infection was detected. Seven different regions in the UK were explored and improvements in efficiency from the use of specialised policies presented. As a consequence of the increasing temperatures associated with climate change, efficient control measures for vector-borne diseases such as this are expected to become increasingly important. Our work demonstrates the potential value of using Bayesian optimisation in developing cost-effective disease management strategies
    corecore