385 research outputs found

    Machine Learning in Football Betting: Prediction of Match Results Based on Player Characteristics

    Get PDF
    In recent times, football (soccer) has aroused an increasing amount of attention across continents and entered unexpected dimensions. In this course, the number of bookmakers, who offer the opportunity to bet on the outcome of football games, expanded enormously, which was further strengthened by the development of the world wide web. In this context, one could generate positive returns over time by betting based on a strategy which successfully identifies overvalued betting odds. Due to the large number of matches around the globe, football matches in particular have great potential for such a betting strategy. This paper utilizes machine learning to forecast the outcome of football games based on match and player attributes. A simulation study which includes all matches of the five greatest European football leagues and the corresponding second leagues between 2006 and 2018 revealed that an ensemble strategy achieves statistically and economically significant returns of 1.58% per match. Furthermore, the combination of different machine learning algorithms could neither be outperformed by the individual machine learning approaches nor by a linear regression model or naive betting strategies, such as always betting on the victory of the home team

    Limit order books in statistical arbitrage and anomaly detection

    Full text link
    Cette thèse propose des méthodes exploitant la vaste information contenue dans les carnets d’ordres (LOBs). La première partie de cette thèse découvre des inefficacités dans les LOBs qui sont source d’arbitrage statistique pour les traders haute fréquence. Le chapitre 1 développe de nouvelles relations théoriques entre les actions intercotées afin que leurs prix soient exempts d’arbitrage. Toute déviation de prix est capturée par une stratégie novatrice qui est ensuite évaluée dans un nouvel environnement de backtesting permettant l’étude de la latence et de son importance pour les traders haute fréquence. Le chapitre 2 démontre empiriquement l’existence d’arbitrage lead-lag à haute fréquence. Les relations dites lead-lag ont été bien documentées par le passé, mais aucune étude n’a montré leur véritable potentiel économique. Un modèle économétrique original est proposé pour prédire les rendements de l’actif en retard, ce qu’il réalise de manière précise hors échantillon, conduisant à des opportunités d’arbitrage de courte durée. Dans ces deux chapitres, les inefficacités des LOBs découvertes sont démontrées comme étant rentables, fournissant ainsi une meilleure compréhension des activités des traders haute fréquence. La deuxième partie de cette thèse investigue les séquences anormales dans les LOBs. Le chapitre 3 évalue la performance de méthodes d’apprentissage automatique dans la détection d’ordres frauduleux. En raison de la grande quantité de données, les fraudes sont difficilement détectables et peu de cas sont disponibles pour ajuster les modèles de détection. Un nouveau cadre d’apprentissage profond non supervisé est proposé afin de discerner les comportements anormaux du LOB dans ce contexte ardu. Celui-ci est indépendant de l’actif et peut évoluer avec les marchés, offrant alors de meilleures capacités de détection pour les régulateurs financiers.This thesis proposes methods exploiting the vast informational content of limit order books (LOBs). The first part of this thesis discovers LOB inefficiencies that are sources of statistical arbitrage for high-frequency traders. Chapter 1 develops new theoretical relationships between cross-listed stocks, so their prices are arbitrage free. Price deviations are captured by a novel strategy that is then evaluated in a new backtesting environment enabling the study of latency and its importance for high-frequency traders. Chapter 2 empirically demonstrates the existence of lead-lag arbitrage at high-frequency. Lead-lag relationships have been well documented in the past, but no study has shown their true economic potential. An original econometric model is proposed to forecast returns on the lagging asset, and does so accurately out-of-sample, resulting in short-lived arbitrage opportunities. In both chapters, the discovered LOB inefficiencies are shown to be profitable, thus providing a better understanding of high-frequency traders’ activities. The second part of this thesis investigates anomalous patterns in LOBs. Chapter 3 studies the performance of machine learning methods in the detection of fraudulent orders. Because of the large amount of LOB data generated daily, trade frauds are challenging to catch, and very few cases are available to fit detection models. A novel unsupervised deep learning–based framework is proposed to discern abnormal LOB behavior in this difficult context. It is asset independent and can evolve alongside markets, providing better fraud detection capabilities to market regulators

    Is media just noise? The link between media factors and stock performance

    Get PDF
    PURPOSE OF THE STUDY Interest towards media analytics has increased significantly by both practitioners and academia alike. The hot topic is whether or not qualitative texts contain information relevant to stock financials, and if they do, whether the impact can be used to earn abnormal returns. In order to answer this, we study the impact media factors have on financial metrics in a novel specification that combines all the major media factors in a holistic media model. To transform qualitative texts information into a "sentiment score", we develop a new methodology to estimate sentiment more accurately than currently prevailing methods. DATA AND METHODOLOGY Our study focuses on the S&P 100 constituents between the time period of 2006 and 2011. As a source of qualitative texts, we use major news publications and earnings announcements retrieved from LexisNexis -database using a web scraper program developed for the purpose of this study. We retrieve the financials data for our study using Thomson Reuters Datastream -database. In order to estimate investor sentiment, we employ both the customary word count, as well as our novel Linearized Phrase-Structure -methodology. For word count, we test the Harvard Psychological -dictionary and a finance-specific dictionary by Loughran and McDonald (2011). As our data is panel in nature, we analyze the correlations in our error terms in line with Petersen (2009), first without clustering and then clustering by firm and by time. We find time-effect in our error terms, and therefore employ a Fama-Macbeth (1973) methodology with clustering done in quarters. To mitigate a methodological choice driving our results, we run our specifications with a multitude of alternative specifications. RESULTS We find that Linearized Phrase-Structure (LPS) outperforms the predominant naĂŻve word count methodology. Also, we find that if employing word counts, researchers should employ context dependent dictionaries, such as Loughran and McDonald's (2011). In terms of our main variables, we find that the existing media factors are not mutually exclusive, and impact financial metrics in chorus. Alas, we do not find statistically significant relationship between sentiment and abnormal returns. However, we find a relationship between aggregate market news volume and abnormal returns, and also between sentiment and abnormal volatility. We infer that our findings support limited attention -theory, and provide evidence against market efficiency

    Ensembling and Dynamic Asset Selection for Risk-Controlled Statistical Arbitrage

    Get PDF
    In recent years, machine learning algorithms have been successfully employed to leverage the potential of identifying hidden patterns of financial market behavior and, consequently, have become a land of opportunities for financial applications such as algorithmic trading. In this paper, we propose a statistical arbitrage trading strategy with two key elements: an ensemble of regression algorithms for asset return prediction, followed by a dynamic asset selection. More specifically, we construct an extremely heterogeneous ensemble ensuring model diversity by using state-of-the-art machine learning algorithms, data diversity by using a feature selection process, and method diversity by using individual models for each asset, as well models that learn cross-sectional across multiple assets. Then, their predictive results are fed into a quality assurance mechanism that prunes assets with poor forecasting performance in the previous periods. We evaluate the approach on historical data of component stocks of the SP500 index. By performing an in-depth risk-return analysis, we show that this setup outperforms highly competitive trading strategies considered as baselines. Experimentally, we show that the dynamic asset selection enhances overall trading performance both in terms of return and risk. Moreover, the proposed approach proved to yield superior results during both financial turmoil and massive market growth periods, and it showed to have general application for any risk-balanced trading strategy aiming to exploit different asset classes

    Analysis of frequent trading effects of various machine learning models

    Full text link
    In recent years, high-frequency trading has emerged as a crucial strategy in stock trading. This study aims to develop an advanced high-frequency trading algorithm and compare the performance of three different mathematical models: the combination of the cross-entropy loss function and the quasi-Newton algorithm, the FCNN model, and the vector machine. The proposed algorithm employs neural network predictions to generate trading signals and execute buy and sell operations based on specific conditions. By harnessing the power of neural networks, the algorithm enhances the accuracy and reliability of the trading strategy. To assess the effectiveness of the algorithm, the study evaluates the performance of the three mathematical models. The combination of the cross-entropy loss function and the quasi-Newton algorithm is a widely utilized logistic regression approach. The FCNN model, on the other hand, is a deep learning algorithm that can extract and classify features from stock data. Meanwhile, the vector machine is a supervised learning algorithm recognized for achieving improved classification results by mapping data into high-dimensional spaces. By comparing the performance of these three models, the study aims to determine the most effective approach for high-frequency trading. This research makes a valuable contribution by introducing a novel methodology for high-frequency trading, thereby providing investors with a more accurate and reliable stock trading strategy

    Stock Trend Prediction Using Candlestick Charting and Ensemble Machine Learning Techniques with a Novelty Feature Engineering Scheme

    Get PDF
    Stock market forecasting is a knotty challenging task due to the highly noisy, nonparametric, complex and chaotic nature of the stock price time series. With a simple eight-trigram feature engineering scheme of the inter-day candlestick patterns, we construct a novel ensemble machine learning framework for daily stock pattern prediction, combining traditional candlestick charting with the latest artificial intelligence methods. Several machine learning techniques, including deep learning methods, are applied to stock data to predict the direction of the closing price. This framework can give a suitable machine learning prediction method for each pattern based on the trained results. The investment strategy is constructed according to the ensemble machine learning techniques. Empirical results from 2000 to 2017 of China’s stock market confirm that our feature engineering has effective predictive power, with a prediction accuracy of more than 60% for some trend patterns. Various measures such as big data, feature standardization, and elimination of abnormal data can effectively solve data noise. An investment strategy based on our forecasting framework excels in both individual stock and portfolio performance theoretically. However, transaction costs have a significant impact on investment. Additional technical indicators can improve the forecast accuracy to varying degrees. Technical indicators, especially momentum indicators, can improve forecasting accuracy in most cases
    • …
    corecore