3 research outputs found

    Control chart patterns recognition with constrained data

    Get PDF
    Recognition and classification of non-random patterns of manufacturing process data can provide clues to the possible causes that contributed to the product defects. Early detection of abnormal process patterns, particularly in highly precise and rapid automated manufacturing is necessary to avoid wastage and catastrophic failures. Towards this end, various control chart patterns recognition (CCPR) methods have been proposed by researchers. Most of the existing control chart patterns recognizers assumed that data is fully available and complete. However, in reality, process data streams may be constrained due to missing, imbalanced or inadequate data acquisition and measurement problems, erroneous entries and technical failure during data acquisition process. The aim of this study is to investigate and develop an effective recognition scheme capable of handling constrained control chart patterns. Various scenarios of data constraints involving missing rates, missing mechanisms, dataset size and imbalance rate were investigated. The proposed scheme comprises the following key components: (i) characterization of input data stream, (ii) imputation and feature extraction, and (iii) alternative recognition schemes. The proposed scheme was developed and tested to recognize the constrained patterns, namely, random, increasing/decreasing trend, upward/downward shift and cyclic patterns. The effect of design parameters on the recognition performance was examined. The Exponentially-Weighted Moving Average (EWMA) imputation, oversampling and Fuzzy Information Decomposition (FID) were investigated. This research revealed that some constraints in the dataset can eventually change the distribution and violate the normality assumption. The performance of alternative designs was compared by mean square error, percentage of correct recognition, confusion matrix, average run length (ARL), t-test, sensitivity, specificity and G-mean. The results demonstrated that the scheme with an ANNfuzzy recognizer trained using FID-treated constrained patterns significantly reduce false alarms and has better discriminative ability. The proposed scheme was verified and validated through comparative studies with published works. This research can be further extended by investigating an adaptive fuzzy router to assign incoming input data stream to an appropriate scheme that matches complexity in the constrained data streams, amongst others

    Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models

    Get PDF
    Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are built using predictive models in a batch-learning setup. This thesis investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these Intrusion Detection models. Specifically, this thesis studied the adaptability features of three well known Machine Learning algorithms: C5.0, Random Forest, and Support Vector Machine. The ability of these algorithms to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. A new dataset (STA2018) was generated for this thesis and used for the analysis. This thesis has demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation (test) traffic have different statistical properties. Further investigation was undertaken to analyse the effects of feature selection and data balancing processes on a model’s accuracy when evaluation traffic with different significant features were used. The effects of threshold adaptation on reducing the accuracy degradation of these models was statistically analysed. The results showed that, of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates. This thesis then extended the analysis to apply threshold adaptation on sampled traffic subsets, by using different sample sizes, sampling strategies and label error rates. This investigation showed the robustness of the Random Forest algorithm in identifying the best threshold. The Random Forest algorithm only needed a sample that was 0.05% of the original evaluation traffic to identify a discriminating threshold with an overall accuracy rate of nearly 90% of the optimal threshold."This research was supported and funded by the Government of the Sultanate of Oman represented by the Ministry of Higher Education and the Sultan Qaboos University." -- p. i
    corecore