47 research outputs found

    An adaptive behavioral-based incremental batch learning malware variants detection model using concept drift detection and sequential deep learning

    Get PDF
    Malware variants are the major emerging threats that face cybersecurity due to the potential damage to computer systems. Many solutions have been proposed for detecting malware variants. However, accurate detection is challenging due to the constantly evolving nature of the malware variants that cause concept drift. Existing malware detection solutions assume that the mapping learned from historical malware features will be valid for new and future malware. The relationship between input features and the class label has been considered stationary, which doesn't hold for the ever-evolving nature of malware variants. Malware features change dynamically due to code obfuscations, mutations, and the modification made by malware authors to change the features' distribution and thus evade the detection rendering the detection model obsolete and ineffective. This study presents an Adaptive behavioral-based Incremental Batch Learning Malware Variants Detection model using concept drift detection and sequential deep learning (AIBL-MVD) to accommodate the new malware variants. Malware behaviors were extracted using dynamic analysis by running the malware files in a sandbox environment and collecting their Application Programming Interface (API) traces. According to the malware first-time appearance, the malware samples were sorted to capture the malware variants' change characteristics. The base classifier was then trained based on a subset of historical malware samples using a sequential deep learning model. The new malware samples were mixed with a subset of old data and gradually introduced to the learning model in an adaptive batch size incremental learning manner to address the catastrophic forgetting dilemma of incremental learning. The statistical process control technique has been used to detect the concept drift as an indication for incrementally updating the model as well as reducing the frequency of model updates. Results from extensive experiments show that the proposed model is superior in terms of detection rate and efficiency compared with the static model, periodic retraining approaches, and the fixed batch size incremental learning approach. The model maintains an average of 99.41% detection accuracy of new and variants malware with a low updating frequency of 1.35 times per month

    Perbandingan Performa Teknik Sampling Data untuk Klasifikasi Pasien Terinfeksi Covid-19 Menggunakan Rontgen Dada

    Get PDF
    The COVID-19 virus became a virus that was deadly and shocked the world. One of the consequences caused by the COVID-19 virus is a respiratory infection. The solution put forward for this problem is with a prediction of the COVID-19 virus infection. This prediction was made based on the classification of chest X-ray data. One challenging issue in this field is the imbalance on the amount of data between infected chest X-rays and uninfected chest X-rays. The result of imbalanced data is data classification that ignores classes with fewer data. To overcome this problem, the data sampling technique becomes a mechanism to make the data balanced. For this reason, several data sampling techniques will be evaluated in this study. Data sampling techniques include Random Undersampling (RUS), Random Oversampling (ROS), Combination of Over-Undersampling (COUS), Synthetic Minority Over-sampling Technique (SMOTE), and Tomek Link (T-Link). This study also uses the Support Vector Machines (SVM) data classification, because it has high accuracy. Furthermore, the evaluation is carried out by selecting the highest accuracy and Area Under Curve (AUC). The best sampling technique found was SMOTE with an accuracy value of 99% and an AUC value of 99.32%. The SMOTE technique is the best data sampling technique for the classification of COVID-19 chest x-ray data.Virus COVID-19 menjadi virus yang mematikan dan menggemparkan dunia. Salah satu akibat yang ditimbulkan oleh virus COVID-19 adalah infeksi saluran pernapasan. Solusi yang diajukan untuk masalah ini adalah dengan prediksi infeksi virus COVID-19. Prediksi ini dibuat berdasarkan klasifikasi data rontgen dada. Namun, jumlah data rontgen dada adalah data yang tidak seimbang. Hasil dari ketidakseimbangan data adalah klasifikasi data yang mengabaikan kelas dengan data yang lebih sedikit. Untuk mengatasi masalah tersebut maka teknik pengambilan sampel data menjadi mekanisme untuk membuat data menjadi seimbang. Untuk itu, beberapa teknik pengambilan sampel data akan dievaluasi dalam penelitian ini. Teknik pengambilan sampel data antara lain Random Undersampling (RUS), Random Oversampling (ROS), Combination of Over-Undersampling (COUS), Synthetic Minority Over-sampling Technique (SMOTE), dan Tomek Link (T-Link). Penelitian ini juga menggunakan klasifikasi data Support Vector Machines (SVM), karena memiliki akurasi yang tinggi. Selanjutnya evaluasi dilakukan dengan memilih akurasi dan Area Under Curve (AUC) tertinggi . Teknik pengambilan sampel terbaik yang ditemukan adalah SMOTE dengan nilai akurasi 99% dan nilai AUC 99.32%. Teknik SMOTE merupakan teknik pengambilan sampel data terbaik untuk klasifikasi data rontgen dada COVID-19

    Performance evaluation of botnet detection using machine learning techniques

    Get PDF
    Cybersecurity is seriously threatened by Botnets, which are controlled networks of compromised computers. The evolving techniques used by botnet operators make it difficult for traditional methods of botnet identification to stay up. Machine learning has become increasingly effective in recent years as a means of identifying and reducing these hazards. The CTU-13 dataset, a frequently used dataset in the field of cybersecurity, is used in this study to offer a machine learning-based method for botnet detection. The suggested methodology makes use of the CTU-13, which is made up of actual network traffic data that was recorded in a network environment that had been attacked by a botnet. The dataset is used to train a variety of machine learning algorithms to categorize network traffic as botnet-related/benign, including decision tree, regression model, naïve Bayes, and neural network model. We employ a number of criteria, such as accuracy, precision, and sensitivity, to measure how well each model performs in categorizing both known and unidentified botnet traffic patterns. Results from experiments show how well the machine learning based approach detects botnet with accuracy. It is potential for use in actual world is demonstrated by the suggested system’s high detection rates and low false positive rates

    Evaluating Sampling Techniques for Healthcare Insurance Fraud Detection in Imbalanced Dataset

    Get PDF
    Detecting fraud in the healthcare insurance dataset is challenging due to severe class imbalance, where fraud cases are rare compared to non-fraud cases. Various techniques have been applied to address this problem, such as oversampling and undersampling methods. However, there is a lack of comparison and evaluation of these sampling methods. Therefore, the research contribution of this study is to conduct a comprehensive evaluation of the different sampling methods in different class distributions, utilizing multiple evaluation metrics, including , , , Precision, and Recall. In addition, a model evaluation approach be proposed to address the issue of inconsistent scores in different metrics. This study employs a real-world dataset with the XGBoost algorithm utilized alongside widely used data sampling techniques such as Random Oversampling and Undersampling, SMOTE, and Instance Hardness Threshold. Results indicate that Random Oversampling and Undersampling perform well in the 50% distribution, while SMOTE and Instance Hardness Threshold methods are more effective in the 70% distribution. Instance Hardness Threshold performs best in the 90% distribution. The 70% distribution is more robust with the SMOTE and Instance Hardness Threshold, particularly in the consistent score in different metrics, although they have longer computation times. These models consistently performed well across all evaluation metrics, indicating their ability to generalize to new unseen data in both the minority and majority classes. The study also identifies key features such as costs, diagnosis codes, type of healthcare service, gender, and severity level of diseases, which are important for accurate healthcare insurance fraud detection. These findings could be valuable for healthcare providers to make informed decisions with lower risks. A well-performing fraud detection model ensures the accurate classification of fraud and non-fraud cases. The findings also can be used by healthcare insurance providers to develop more effective fraud detection and prevention strategies

    Exploring the use of conversational agents to improve cyber situational awareness in the Internet of Things (IoT).

    Get PDF
    The Internet of Things (IoT) is an emerging paradigm, which aims to extend the power of the Internet beyond computers and smartphones to a vast and growing range of "things" - devices, processes and environments. The result is an interconnected world where humans and devices interact with each other, establishing a smart environment for the continuous exchange of information and services. Billions of everyday devices such as home appliances, surveillance cameras, wearables and doorbells, enriched with computational and networking capabilities, have already been connected to the Internet. However, as the IoT has grown, the demand for low-cost, easy-to-deploy devices has also increased, leading to the production of millions of insecure Internet-connected smart devices. Many of these devices can be easily exploited and leveraged to perform large-scale attacks on the Internet, such as the recently witnessed botnet attacks. Since these attacks often target consumer-level products, which commonly lack a screen or user interface, it can be difficult for users to identify signs of infection and be aware of devices that have been compromised. This thesis presents four studies which collectively explored how user awareness of threats in consumer IoT networks could be improved. Maintaining situational awareness of what is happening within a home network is challenging, not least because malicious activity often occurs in devices which are not easily monitored. This thesis evaluated the effectiveness of conversational agents to improve Cyber Situational Awareness. In doing so, it presented the first study to investigate their ability to help users improve their perception of smart device activity, comprehend this in the context of their home environment, and project this knowledge to determine if a threat had occurred or may occur in the future. The research demonstrated how a BLSTMRNN with word embedding could be used to extract semantic meaning from packets to perform deep packet inspection and detect IoT botnet activity. Specifically, how the models use of contextual information from both the past and future enabled better predictions to be made about the current state (packet) due to the sequential nature of the network traffic. In addition, a cross-sectional study examined users' awareness and perception of threats and found that, although users value security and privacy, they found it difficult to identify threats and infected devices. Finally, novel cross-sectional and longitudinal studies evaluated the use of conversational agents, and demonstrated them to be an effective and efficient method of improving Cyber Situational Awareness. In particular, this was shown to be true when using a multi-modal approach and combining aural, verbal and visual modalities

    An intelligent context-aware threat detection and response model for smart cyber-physical systems

    Get PDF
    Smart cities, businesses, workplaces, and even residences have all been converged by the Internet of Things (IoT). The types and characteristics of these devices vary depending on the industry 4.0 and have rapidly increased recently, especially in smart homes. These gadgets can expose users to serious cyber dangers because of a variety of computing constraints and vulnerabilities in the security-by-design concept. The smart home network testbed setup presented in this study is used to evaluate and validate the protection of the smart cyber-physical system. The context-aware threat intelligence and response model identifies the states of the aligned smart devices to distinguish between real-world typical and attack scenarios. It then dynamically writes specific rules for protection against potential cyber threats. The context-aware model is trained on IoT Research and Innovation Lab - Smart Home System (IRIL-SHS) testbed dataset. The labeled dataset is utilized to create a random forest model, which is subsequently used to train and test the context-aware threat intelligence SHS model's effectiveness and performance. Finally, the model's logic is used to gain rules to be included in Suricata signatures and the firewall rulesets for the response system. Significant values of the measuring parameters were found in the results. The presented model can be used for the real-time security of smart home cyber-physical systems and develops a vision of security challenges for Industry 4.0

    Improving Accuracy of Intrusion Detection Model Using PCA and optimized SVM

    Get PDF
    Intrusion detection is very essential for providing security to different network domains and is mostly used for locating and tracing the intruders. There are many problems with traditional intrusion detection models (IDS) such as low detection capability against unknown network attack, high false alarm rate and insufficient analysis capability. Hence the major scope of the research in this domain is to develop an intrusion detection model with improved accuracy and reduced training time. This paper proposes a hybrid intrusiondetection model by integrating the principal component analysis (PCA) and support vector machine (SVM). The novelty of the paper is the optimization of kernel parameters of the SVM classifier using automatic parameter selection technique. This technique optimizes the punishment factor (C) and kernel parameter gamma (γ), thereby improving the accuracy of the classifier and reducing the training and testing time. The experimental results obtained on the NSL KDD and gurekddcup dataset show that the proposed technique performs better with higher accuracy, faster convergence speed and better generalization. Minimum resources are consumed as the classifier input requires reduced feature set for optimum classification. A comparative analysis of hybrid models with the proposed model is also performed
    corecore