1,713 research outputs found

    Detecting market manipulation in stock market data

    Get PDF
    Anomaly Detection is an extensively researched problem that has diverse applications in many domains. Anomaly detection is the process of finding data points or patterns that do not conform to expected behavior within a dataset. Solutions to this problem have used techniques from disciplines such as statistics, machine learning, data mining, spectral theory and information theory. In the case of stock market data, the input is a non-linear complex time series that render statistical methods ineffective. The aim of this thesis, is to detect anomalies within the Standard and Poor and Qatar Stock Exchange using the behavior of similar time series. Many works on stock market manipulation focus on supervised learning techniques, which require labeled datasets. The labeling process requires substantial efforts. Anomalous behavior is also dynamic in nature. For those reasons, the development of an unsupervised market manipulation detection technique would be very interesting. The Contextual Anomaly Detector (CAD) is an unsupervised method that finds anomalies by looking at similarly behaving time series and uses them to predict expected values. When the predicted value is different from the actual value in the time series by a certain threshold, it is considered an anomaly. This thesis will look at the Contextual Anomaly Detector (CAD) and implement a different preprocessing step to improve recall and precision

    Limit order books in statistical arbitrage and anomaly detection

    Full text link
    Cette thèse propose des méthodes exploitant la vaste information contenue dans les carnets d’ordres (LOBs). La première partie de cette thèse découvre des inefficacités dans les LOBs qui sont source d’arbitrage statistique pour les traders haute fréquence. Le chapitre 1 développe de nouvelles relations théoriques entre les actions intercotées afin que leurs prix soient exempts d’arbitrage. Toute déviation de prix est capturée par une stratégie novatrice qui est ensuite évaluée dans un nouvel environnement de backtesting permettant l’étude de la latence et de son importance pour les traders haute fréquence. Le chapitre 2 démontre empiriquement l’existence d’arbitrage lead-lag à haute fréquence. Les relations dites lead-lag ont été bien documentées par le passé, mais aucune étude n’a montré leur véritable potentiel économique. Un modèle économétrique original est proposé pour prédire les rendements de l’actif en retard, ce qu’il réalise de manière précise hors échantillon, conduisant à des opportunités d’arbitrage de courte durée. Dans ces deux chapitres, les inefficacités des LOBs découvertes sont démontrées comme étant rentables, fournissant ainsi une meilleure compréhension des activités des traders haute fréquence. La deuxième partie de cette thèse investigue les séquences anormales dans les LOBs. Le chapitre 3 évalue la performance de méthodes d’apprentissage automatique dans la détection d’ordres frauduleux. En raison de la grande quantité de données, les fraudes sont difficilement détectables et peu de cas sont disponibles pour ajuster les modèles de détection. Un nouveau cadre d’apprentissage profond non supervisé est proposé afin de discerner les comportements anormaux du LOB dans ce contexte ardu. Celui-ci est indépendant de l’actif et peut évoluer avec les marchés, offrant alors de meilleures capacités de détection pour les régulateurs financiers.This thesis proposes methods exploiting the vast informational content of limit order books (LOBs). The first part of this thesis discovers LOB inefficiencies that are sources of statistical arbitrage for high-frequency traders. Chapter 1 develops new theoretical relationships between cross-listed stocks, so their prices are arbitrage free. Price deviations are captured by a novel strategy that is then evaluated in a new backtesting environment enabling the study of latency and its importance for high-frequency traders. Chapter 2 empirically demonstrates the existence of lead-lag arbitrage at high-frequency. Lead-lag relationships have been well documented in the past, but no study has shown their true economic potential. An original econometric model is proposed to forecast returns on the lagging asset, and does so accurately out-of-sample, resulting in short-lived arbitrage opportunities. In both chapters, the discovered LOB inefficiencies are shown to be profitable, thus providing a better understanding of high-frequency traders’ activities. The second part of this thesis investigates anomalous patterns in LOBs. Chapter 3 studies the performance of machine learning methods in the detection of fraudulent orders. Because of the large amount of LOB data generated daily, trade frauds are challenging to catch, and very few cases are available to fit detection models. A novel unsupervised deep learning–based framework is proposed to discern abnormal LOB behavior in this difficult context. It is asset independent and can evolve alongside markets, providing better fraud detection capabilities to market regulators


    Get PDF
    Artificial neural networks have been proposed as useful tools in time series analysis in a variety of applications. They are capable of providing good solutions for a variety of problems, including classification and prediction. However, for time series analysis, it must be taken into account that the variables of data are related to the time dimension and are highly correlated. The main aim of this research work is to investigate and develop efficient dynamic neural networks in order to deal with data analysis issues. This research work proposes a novel dynamic self-organised multilayer neural network based on the immune algorithm for financial time series prediction and biomedical signal classification, combining the properties of both recurrent and self-organised neural networks. The first case study that has been addressed in this thesis is prediction of financial time series. The financial time series signal is in the form of historical prices of different companies. The future prediction of price in financial time series enables businesses to make profits by predicting or simply guessing these prices based on some historical data. However, the financial time series signal exhibits a highly random behaviour, which is non-stationary and nonlinear in nature. Therefore, the prediction of this type of time series is very challenging. In this thesis, a number of experiments have been simulated to evaluate the ability of the designed recurrent neural network to forecast the future value of financial time series. The resulting forecast made by the proposed network shows substantial profits on financial historical signals when compared to the self-organised hidden layer inspired by immune algorithm and multilayer perceptron neural networks. These results suggest that the proposed dynamic neural networks has a better ability to capture the chaotic movement in financial signals. The second case that has been addressed in this thesis is for predicting preterm birth and diagnosing preterm labour. One of the most challenging tasks currently facing the healthcare community is the identification of preterm labour, which has important significances for both healthcare and the economy. Premature birth occurs when the baby is born before completion of the 37-week gestation period. Incomplete understanding of the physiology of the uterus and parturition means that premature labour prediction is a difficult task. The early prediction of preterm births could help to improve prevention, through appropriate medical and lifestyle interventions. One promising method is the use of Electrohysterography. This method records the uterine electrical activity during pregnancy. In this thesis, the proposed dynamic neural network has been used for classifying between term and preterm labour using uterine signals. The results indicated that the proposed network generated improved classification accuracy in comparison to the benchmarked neural network architectures
    • …