GKM-OD: Gaussian knowledge based modelling for outlier detection

Abstract

Outlier detection is a critical process in data engineering. Leveraging machine learning techniques for outlier detection enables the handling of large-scale, high-dimensional data, enhancing detection accuracy and efficiency. Traditional methods typically model data directly in the data space. However, these approaches often struggle to accurately distinguish inliers from outliers when dealing with complex data distributions. GMM can flexibly fit complex, multi-peak distributions using multiple Gaussian components and effectively identify outliers through probabilistic modelling. We introduce a novel outlier detection approach, which improves detection efficiency by indirectly modelling data in a latent space using a Gaussian Mixture Model (GMM).This approach aligns with a growing trend in AI, notably advocated by Yann LeCun, that emphasizes decision-making and learning in latent representation spaces, instead of depending on raw token or feature spaces. For this, we design an encoder-decoder neural network with a GMM as the decision layer, enabling effective identification of outliers through probabilistic modelling. Our method not only addresses practical needs in anomaly detection but also contributes to this broader trend of latent space modelling as a step toward more autonomous and generalisable learning systems.Extensive evaluations on public and proprietary datasets demonstrate that our method outperforms existing approaches, including DAGMM and ECOD, highlighting its superiority in accuracy.<br/

Similar works

Full text

thumbnail-image

Queen's University Belfast Research Portal

redirect
Last time updated on 27/11/2025

This paper was published in Queen's University Belfast Research Portal.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: http://creativecommons.org/licenses/by/4.0/