Search CORE

4 research outputs found

Applications of machine learning in finance: analysis of international portfolio flows using regime-switching models

Author: Ó Cinnéide Ruairí
Publication venue: 'University College Cork'
Publication date: 01/01/2019
Field of study

Recent advances in machine learning are finding commercial applications across many sectors, not least the financial industry. This thesis explores applications of machine learning in quantitative finance through two approaches. The current state of the art is evaluated through an extensive review of recent quantitative finance literature. Themes and technologies are identified and classified, and the key use cases highlighted from the emerging literature. Machine learning is found to enable deeper analysis of financial data and the modelling of complex nonlinear relationships within data. The ability to incorporate alternative data in the investment process is also enabled. Innovations in backtesting and performance metrics are also made possible through the application of machine learning. Demonstrating a practical application of machine learning in quantitative finance, regime-switching models are applied to analyse and extract information from international portfolio flows. Regime-switching models capture properties of international portfolio flows previously found in the literature, such as persistence in flows compared to returns, and a relationship between flows and returns. Structural breaks and persistent regime shifts in investor behaviour are identified by the models. Regime-switching models infer regimes in the data which exhibit unique characteristic flows and returns. To determine whether the information extracted could aid in the investment process, a portfolio of global assets was constructed, with positions determined using a flowbased regime-switching model. The portfolio outperforms two benchmarks, a buy & hold strategy and the MSCI World Index in walk-forward out-of-sample tests using daily and weekly data

Irish Universities

Cork Open Research Archive

Machine learning for financial applications: self-organising maps, hierarchical clustering and dynamic time-warping for portfolio constructive

Author: Emerson Sophie
Publication venue: 'University College Cork'
Publication date: 15/12/2019
Field of study

This study investigates how modern machine learning (ML) techniques can be used to advance the field of quantitative investing. A broad literature review evaluated the common applications for ML in finance, and what ML algorithms are being used. The results show ML is commonly applied to the areas of Return Forecasting, Portfolio Construction, Ethics, Fraud Detection Decision Making Language Processing and Sentiment Analysis. Neural Network technology and support vector machine are identified as popular ML algorithms. A second review was carried out, focusing in the area of ML for quantitative finance in recent years finds three primary areas; Return forecasting, Portfolio construction and Risk management. A practical ML experiment carried out as a proof of concept of ML for financial applications. This experiment was informed by the results of the broad and more focused literature searches. Two forms of ML techniques are used to analyse market return data and equity flow data (provided by State Street Global Markets) and create a portfolio from insights derived from the ML technology. The ML technologies employed are those of Self-Organising Maps and Hierarchical Clustering. The portfolios created were tested in terms of risk, profitability and stability. Stable regimes and profitable portfolios are created. Results show that portfolios obtained by analysing equity flow data consistently outperform those created by analysing return data

Irish Universities

Cork Open Research Archive

Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams

Author: Fouché Edouard
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 07/12/2020
Field of study

Data Mining – known as the process of extracting knowledge from massive data sets – leads to phenomenal impacts on our society, and now affects nearly every aspect of our lives: from the layout in our local grocery store, to the ads and product recommendations we receive, the availability of treatments for common diseases, the prevention of crime, or the efficiency of industrial production processes. However, Data Mining remains difficult when (1) data is high-dimensional, i.e., has many attributes, and when (2) data comes as a stream. Extracting knowledge from high-dimensional data streams is impractical because one must cope with two orthogonal sets of challenges. On the one hand, the effects of the so-called "curse of dimensionality" bog down the performance of statistical methods and yield to increasingly complex Data Mining problems. On the other hand, the statistical properties of data streams may evolve in unexpected ways, a phenomenon known in the community as "concept drift". Thus, one needs to update their knowledge about data over time, i.e., to monitor the stream. While previous work addresses high-dimensional data sets and data streams to some extent, the intersection of both has received much less attention. Nevertheless, extracting knowledge in this setting is advantageous for many industrial applications: identifying patterns from high-dimensional data streams in real-time may lead to larger production volumes, or reduce operational costs. The goal of this dissertation is to bridge this gap. We first focus on dependency estimation, a fundamental task of Data Mining. Typically, one estimates dependency by quantifying the strength of statistical relationships. We identify the requirements for dependency estimation in high-dimensional data streams and propose a new estimation framework, Monte Carlo Dependency Estimation (MCDE), that fulfils them all. We show that MCDE leads to efficient dependency monitoring. Then, we generalise the task of monitoring by introducing the Scaling Multi-Armed Bandit (S-MAB) algorithms, extending the Multi-Armed Bandit (MAB) model. We show that our algorithms can efficiently monitor statistics by leveraging user-specific criteria. Finally, we describe applications of our contributions to Knowledge Discovery. We propose an algorithm, Streaming Greedy Maximum Random Deviation (SGMRD), which exploits our new methods to extract patterns, e.g., outliers, in high-dimensional data streams. Also, we present a new approach, that we name kj-Nearest Neighbours (kj-NN), to detect outlying documents within massive text corpora. We support our algorithmic contributions with theoretical guarantees, as well as extensive experiments against both synthetic and real-world data. We demonstrate the benefits of our methods against real-world use cases. Overall, this dissertation establishes fundamental tools for Knowledge Discovery in high-dimensional data streams, which help with many applications in the industry, e.g., anomaly detection, or predictive maintenance. To facilitate the application of our results and future research, we publicly release our implementations, experiments, and benchmark data via open-source platforms

KITopen

Profitable Bandits

Author: Achab Mastane
Clémençon Stéphan
Garivier Aurélien
Publication venue: PMLR
Publication date: 01/01/2018
Field of study

International audienceOriginally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K ≥ 1 possible actions. For each action chosen, she then respectively pays and receives the sum of a random number of costs and rewards. Her objective is to maximize her cumulated profit. We adapt and study three well-known strategies in this purpose, that were proved to be most efficient in other settings: kl-UCB, Bayes-UCB and Thompson Sampling. For each of them, we prove a finite time regret bound which, together with a lower bound we obtain as well, establishes asymptotic optimality in some cases. Our goal is also to compare these three strategies from a theoretical and empirical perspective both at the same time. We give simple, self-contained proofs that emphasize their similarities, as well as their differences. While both Bayesian strategies are automatically adapted to the geometry of information, the numerical experiments carried out show a slight advantage for Thompson Sampling in practice

HAL-ENS-LYON

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot