The paper introduces Supervised Embedding and Clustering Anomaly Detection
(SEMC-AD), a method designed to efficiently identify faulty alarm logs in a
mobile network and alleviate the challenges of manual monitoring caused by the
growing volume of alarm logs. SEMC-AD employs a supervised embedding approach
based on deep neural networks, utilizing historical alarm logs and their labels
to extract numerical representations for each log, effectively addressing the
issue of imbalanced classification due to a small proportion of anomalies in
the dataset without employing one-hot encoding. The robustness of the embedding
is evaluated by plotting the two most significant principle components of the
embedded alarm logs, revealing that anomalies form distinct clusters with
similar embeddings. Multivariate normal Gaussian clustering is then applied to
these components, identifying clusters with a high ratio of anomalies to normal
alarms (above 90%) and labeling them as the anomaly group. To classify new
alarm logs, we check if their embedded vectors' two most significant principle
components fall within the anomaly-labeled clusters. If so, the log is
classified as an anomaly. Performance evaluation demonstrates that SEMC-AD
outperforms conventional random forest and gradient boosting methods without
embedding. SEMC-AD achieves 99% anomaly detection, whereas random forest and
XGBoost only detect 86% and 81% of anomalies, respectively. While supervised
classification methods may excel in labeled datasets, the results demonstrate
that SEMC-AD is more efficient in classifying anomalies in datasets with
numerous categorical features, significantly enhancing anomaly detection,
reducing operator burden, and improving network maintenance