Automatic surveillance of abnormal trading behaviours/patterns (ATPs) in capital
markets is essential to protect the capital of legitimate traders from price distortion
of finance assets. Detection of ATPs involves the finding of single (one trading order
with large trading volume and long cancellation time, e.g. several minutes) or
sequential (correlated multiple trading orders with small volume and short cancellation
time, e.g. milliseconds) anomalies in trading data. However, accurate and
timely identification of ATPs remains an open challenge due to high volume and
high frequency data as well as unlabelled data. In this research, we have investigated
anomaly detection approaches to address the challenges and filled the knowledge
gap through the following four contributions:
Firstly, we have performed a literature review and conducted a thorough benchmark
evaluation on existing state-of-the-art anomaly detection algorithms (i.e. Artificial
Neural Network- Auto Encoder, Isolation Forest, Local Outlier Factor (LOF),
Histogram-based Outlier Score (HBOS), Angle-based Outlier Detection (ABOD), Principle
Component Analysis (PCA) and K-Nearest Neighbors (KNN) ) using publicly
available datasets from different domains such as health and finance. The experimental
results show that Isolation Forest, HBOS and PCA are robust algorithms
in terms of both high detection performance (Area Under the ROC Curve (AUC) =
0.95) and low computational time for large dataset.
Secondly, as one of the major contributions of this research, we have proposed
a novel generic unsupervised anomaly detection model, which can be applied to
anomaly detection of both financial and non-financial datasets. The essence of the
proposed model consists in partitioning a bounded D-dimensional space (e.g. the
unit hyper-cube ID) by a sequence of random shapes, in which each data will be
trapped either inside or outside, followed by probabilistic modelling of a pattern
of falling inside or outside for a data point. Anomalous data which are rare and
iv
different from the rest of the dataset will be assigned a higher anomaly score.
Thirdly, to investigate the robustness of the proposed anomaly detection model,
we have performed a thorough sensitivity analysis under different hyper-parameters
settings (i.e. the number of random shapes, shape of random shapes, etc.) and different
publicly available datasets. The results show that the model performance stabilises
as the number of random shapes increases. Furthermore, the shape of random
shapes could affect the performance of the algorithm which needs to be optimised
for a given dataset. Also, the results indicate that the algorithm’s computational time
increases linearly with the number of random shapes which shows the robustness
of the algorithm for detecting anomalies in a timely manner.
Finally, we have applied the proposed algorithm on real Bitcoin prices as a case
study and tested, evaluated and compared its performance with the benchmark
algorithms such as Auto Encoder, Isolation Forest, LOF, HBOS, ABOD, PCA and
KNN. The results show that the proposed algorithm achieves AUC = 0.94. Comparing
to the benchmark algorithms, it also outperforms the existing algorithms by 8.5
percent increase while having low computational time