263 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Clustrering of BPJS National Health Insurance Participant Using DBSCAN Algorithm
In the current era of Big Data, getting data is no longer a difficult thing because they can access easily it via the internet, which is open access. A large amount of data can cause many problems in the data, such as data that deviates too far from the average (outliers). The method used to handle outlier data is DBSCAN which is density based clustering. The DBSCAN can be applied in various fields, one of which is the social sector, namely the participation of the JKN BPJS Health in West Nusa Tenggara. This study sees the distribution of BPJS Health participation groups, and to detect outliers so that objects with noise are not included in the cluster. The results of the study using the DBSCAN algorithm show that the optimal epsilon value is between 0.37 points by observing the knee of a curve. and MinPts 3, with the highest silhouette value of 0.2763. The highest JKN BPJS participants are in cluster 1 with 5 sub-districts, the second highest cluster is cluster 3 with 5 sub-districts, while the lowest cluster is cluster 2 with 93 sub-districts. The 13 sub-districts are not included in any group because they are noise data
An Application of Hierarchical Gaussian Processes to the Detection of Anomalies in Star Light Curves
- …