2 research outputs found

    Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods

    Get PDF
    Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques. The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns. The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other. The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques. The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy

    AK-means geometric smote with data complexity analysis for imbalanced dataset

    Get PDF
    Many binary class datasets in real-life applications are affected by class imbalance problem. Data complexities like noise examples, class overlap and small disjuncts problems are observed to play a key role in producing poor classification performance. These complexities tend to exist in tandem with class imbalance problem. Synthetic Minority Oversampling Technique (SMOTE) is a well-known method to re-balance the number of examples in imbalanced datasets. However, this technique cannot effectively tackle data complexities and has the capability of magnifying the degree of complexities. Therefore, various SMOTE variants have been proposed to overcome the downsides of SMOTE. Furthermore, no existing study has yet to identify the correlation between N1 complexity measure and classification measures such as geometric mean (G-Mean) and F1-Score. This study aims: (i) to identify the suitable complexity measures that have correlation with performance measures, (ii) to propose a new SMOTE variant which is K-Means Geometric SMOTE (KM-GSMOTE) that incorporates complexity measures during synthetic data generation task, and (iii) to evaluate KM-GSMOTE in term of classification performance. Series of experiments have been conducted to evaluate the classification performances related to G-Mean and F1-Score as well as the measurement of N1 complexity of benchmark SMOTE variants and KM-GSMOTE. The performance of KM-GSMOTE was evaluated on 6 benchmark binary datasets from the UCI repository. KM-GSMOTE records the highest percentage of average differences of G-Mean (22.76%) and F1-Score (15.13%) for SVM classifier. A correlation between classification measures and N1 complexity measures has been observed from the experimental results. The contributions of this study are (i) introduction of KM-GSMOTE that combines complexity measurement with model selection to pick models with the best classification performance and lower complexity value and (ii) observation of connection between classification performance and complexity measure, showing that as N1 complexity measure decreases, the likelihood of obtaining a substantial classification performance increases
    corecore