56,591 research outputs found

    Knowledge Reused Outlier Detection

    Get PDF
    Tremendous efforts have been invested in the unsupervised outlier detection research, which is conducted on unlabeled data set with abnormality assumptions. With abundant related labeled data available as auxiliary information, we consider transferring the knowledge from the labeled source data to facilitate the unsupervised outlier detection on target data set. To fully make use of the source knowledge, the source data and target data are put together for joint clustering and outlier detection using the source data cluster structure as a constraint. To achieve this, the categorical utility function is employed to regularize the partitions of target data to be consistent with source data labels. With an augmented matrix, the problem is completely solved by a K-means - a based method with the rigid mathematical formulation and theoretical convergence guarantee. We have used four real-world data sets and eight outlier detection methods of different kinds for extensive experiments and comparison. The results demonstrate the effectiveness and significant improvements of the proposed methods in terms of outlier detection and cluster validity metrics. Moreover, the parameter analysis is provided as a practical guide, and noisy source label analysis proves that the proposed method can handle real applications where source labels can be noisy

    SubCOID: an Attempt to Explore Cluster-Outlier Iterative Detection Approach to Multi-Dimensional Data Analysis in Subspace

    Get PDF
    Many data mining algorithms focus on clustering methods. There are also a lot of approaches designed for outlier detection. We observe that, in many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Clusters and outliers should be treated as the concepts of the same importance in data analysis. In our previous work [22] we proposed a cluster-outlier iterative detection algorithm in full data space. However, in high dimensional spaces, for a given cluster or outlier, not all dimensions may be relevant to it. In this paper we extend our work in subspace area, tending to detect the clusters and outliers in another perspective for noisy data. Each cluster is associated with its own subset of dimensions, so is each outlier. The partition, subsets of dimensions and qualities of clusters are detected and adjusted according to the intra-relationship within clusters and the inter-relationship between clusters and outliers, and vice versa. This process is performed iteratively until a certain termination condition is reached. This data processing algorithm can be applied in many fields such as pattern recognition, data clustering and signal processing

    Outlier Based Fraud Detection System

    Get PDF
    Data mining has the vital task of Outlier detection which aims to detect an outlier from given datasets. The analysis or detection of outlier data is referred to as Outlier Mining. In Data mining, outlier detection is the identification of unusual or distant data records that might be require further investigation or analysis. This paper provides the data driven methods for various fraud detection systems based on literature review, fraudulent activities or cases and comparative research. Outlier detection is the technique which discovers such type of data from the given data set. Several techniques of outlier detection have been introduced which requires input parameter from the user. The goal of this proposed work is to partition the input data set into the number of clusters using K-NN algorithm. Then the clusters are given as an input to the outlier detection methods namely cluster based outlier algorithm and Local Outlier Factor Algorithm. The Performance evaluation of this algorithm confirms that our approach of finding local outliers can be practically implemented

    Splitting hybrid Make-To-Order and Make-To-Stock demand profiles

    Get PDF
    In this paper a demand time series is analysed to support Make-To-Stock (MTS) and Make-To-Order (MTO) production decisions. Using a purely MTS production strategy based on the given demand can lead to unnecessarily high inventory levels thus it is necessary to identify likely MTO episodes. This research proposes a novel outlier detection algorithm based on special density measures. We divide the time series' histogram into three clusters. One with frequent-low volume covers MTS items whilst a second accounts for high volumes which is dedicated to MTO items. The third cluster resides between the previous two with its elements being assigned to either the MTO or MTS class. The algorithm can be applied to a variety of time series such as stationary and non-stationary ones. We use empirical data from manufacturing to study the extent of inventory savings. The percentage of MTO items is reflected in the inventory savings which were shown to be an average of 18.1%.Comment: demand analysis; time series; outlier detection; production strategy; Make-To-Order(MTO); Make-To-Stock(MTS); 15 pages, 9 figure

    ANOMALY DETECTION PADA INTRUSION DETECTION SYSTEM (IDS) MENGGUNAKAN METODE CLUSTERING ANOMALY DETECTION ON INTRUSION DETECTION SYSTEM (IDS) BY CLUSTERING METHOD

    Get PDF
    ABSTRAKSI: Intrusion Detection System (IDS) adalah sekumpulan teknik dan metode untuk mendeteksi aktivitas-aktivitas yang terjadi pada level network dan host. Pada sistem ini terdapat dua pendekatan yang dilakukan : signature-based intrusion detection systems dan anomaly detection system. Pendekatan yang pertama memiliki kelemahan yang cukup rentan, yaitu pendeteksian hanya akan dilakukan terhadap data yang sudah didefinisikan. Sementara untuk anomaly detection, selain menggunakan data yang sudah didefinisikan, dapat pula dilakukan dengan menganalisis pola-pola anomali dari paket network yang datang, namun jika salah mengambil parameter maka metode ini justru akan sering mengakibatkan false alarm.Untuk menganalisis anomaly detection pada paket yang datang dapat dilakukan dengan menggunakan outlier detection scheme. Dengan metode ini, paket-paket yang datang akan dianalisis dengan menggunakan beberapa algoritma, diantaranya adalah clustering. Algoritma clustering pada metode outlier detection scheme melakukan analisis dengan cara meng-cluster-kan data dan menandai cluster terkecil, kemudian cluster terkecil tersebut akan dianggap sebagai anomali.Dalam Tugas Akhir ini dibangun suatu implementasi pendeteksian intrusion (serangan) terhadap sistem atau jaringan komputer menggunakan metode anomaly detection dengan algoritma cluster-based outlier detection. Proses clustering itu sendiri dilakukan terhadap data koneksi jaringan. Adapun implementasi dilakukan dengan menggunakan bahasa pemrograman HTML, script PHP dan DBMS MySQL.Pengujian terhadap sistem anomaly detection ini menunjukkan hasil akhir bahwa hasil pendeteksian anomali sangat bergantung pada tiga hal hal, yaitu tergantung pada pemilihan data yang digunakan untuk dianalisis (dataset), jarak maksimal yang diijinkan dari titik pusat cluster atau center ke setiap data yang menjadi anggota dari cluster tersebut atau biasa disebut jari jari cluster, dan perbandingan jumlah data instrusion dengan data normal pada dataset.Kata Kunci : Intrusion Detection System(IDS), clustering, anomaly detection, outlier detection scheme.ABSTRACT: Intrusion Detection System (IDS) is a group of techniques and methods for detecting activities that hapenned in network and host level. IDS has two approaches : signature-based intrusion detection system and anomaly detection system. First approach has any weakness, the detection can only done if the intrusion had been definited. Therefore except using the data which had been definited, we can also analyze anomaly patterns from the packets , but if we take the wrong parameter this method could eventually be a false alarm.Analyze anomaly detection in network data packets can be handled by outlier detection scheme method. With this method we can build the analysis with some algorithms, one of the algorithms is clustering. Clustering algorithm clustered the data and mark the smallest cluster with assumption that smallest cluster as an anomaly.This final Project will build an implementation of intrusion detection system in computer or network system using anomaly detection method with cluster-based outlier detection algorithm. The process is to clustering data connection record. Implementation use HTML programming language, PHP script, and MySQL DBMS.Anomaly detection system evaluation shows that the results are depend on three things, data which have been analyzed or data set given and the maximum distance betwen center to each data point that included in that cluster, or cluster radius values and ratio between normal data and instrusion data.Keyword: Intrusion Detection System(IDS), clustering, anomaly detection, outlier detection scheme

    Infrequent pattern detection for reliable network traffic analysis using robust evolutionary computation

    Get PDF
    While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have applied our previously proposed CC-based FS with random feature grouping (CCFSRFG) to a benchmark cybersecurity dataset as the preprocessing step. The dataset with original features and the dataset with a reduced number of features were used for infrequent pattern detection. Experimental analysis was performed and evaluated using 10 unsupervised anomaly detection techniques. Therefore, the proposed infrequent pattern detection is termed Unsupervised Infrequent Pattern Detection (UIPD). Then, we compared the experimental results with and without FS in terms of true positive rate (TPR). Experimental analysis indicates that the highest rate of TPR improvement was by cluster-based local outlier factor (CBLOF) of the backdoor infrequent pattern detection, and it was 385.91% when using FS. Furthermore, the highest overall infrequent pattern detection TPR was improved by 61.47% for all infrequent patterns using clustering-based multivariate Gaussian outlier score (CMGOS) with FS

    Anomaly Detection In Blockchain

    Get PDF
    Anomaly detection has been a well-studied area for a long time. Its applications in the financial sector have aided in identifying suspicious activities of hackers. However, with the advancements in the financial domain such as blockchain and artificial intelligence, it is more challenging to deceive financial systems. Despite these technological advancements many fraudulent cases have still emerged. Many artificial intelligence techniques have been proposed to deal with the anomaly detection problem; some results appear to be considerably assuring, but there is no explicit superior solution. This thesis leaps to bridge the gap between artificial intelligence and blockchain by pursuing various anomaly detection techniques on transactional network data of a public financial blockchain named 'Bitcoin'. This thesis also presents an overview of the blockchain technology and its application in the financial sector in light of anomaly detection. Furthermore, it extracts the transactional data of bitcoin blockchain and analyses for malicious transactions using unsupervised machine learning techniques. A range of algorithms such as isolation forest, histogram based outlier detection (HBOS), cluster based local outlier factor (CBLOF), principal component analysis (PCA), K-means, deep autoencoder networks and ensemble method are evaluated and compared

    A comparison of two-stage segmentation methods for choice-based conjoint data: a simulation study.

    Get PDF
    Due to the increasing interest in market segmentation in modern marketing research, several methods for dealing with consumer heterogeneity and for revealing market segments have been described in the literature. In this study, the authors compare eight two-stage segmentation methods that aim to uncover consumer segments by classifying subject-specific indicator values. Four different indicators are used as a segmentation basis. The forces, which are subject-aggregated gradient values of the likelihood function, and the dfbetas, an outlier detection measure, are two indicators that express a subject’s effect on the estimation of the aggregate partworths in the conditional logit model. Although the conditional logit model is generally estimated at the aggregate level, this research obtains individual-level partworth estimates for segmentation purposes. The respondents’ raw choices are the final indicator values. The authors classify the indicators by means of cluster analysis and latent class models. The goal of the study is to compare the segmentation performance of the methods with respect to their success rate, membership recovery and segment mean parameter recovery. With regard to the individual-level estimates, the authors obtain poor segmentation results both with cluster and latent class analysis. The cluster methods based on the forces, the dfbetas and the choices yield good and similar results. Classification of the forces and the dfbetas deteriorates with the use of latent class analysis, whereas latent class modeling of the choices outperforms its cluster counterpart.Two-stage segmentation methods; Choice-based conjoint analysis; Conditional logit model; Market segmentation; Latent class analysis;
    • …
    corecore