566 research outputs found

    Intrusion detection by machine learning = Behatolás detektálás gépi tanulás által

    Get PDF
    Since the early days of information technology, there have been many stakeholders who used the technological capabilities for their own benefit, be it legal operations, or illegal access to computational assets and sensitive information. Every year, businesses invest large amounts of effort into upgrading their IT infrastructure, yet, even today, they are unprepared to protect their most valuable assets: data and knowledge. This lack of protection was the main reason for the creation of this dissertation. During this study, intrusion detection, a field of information security, is evaluated through the use of several machine learning models performing signature and hybrid detection. This is a challenging field, mainly due to the high velocity and imbalanced nature of network traffic. To construct machine learning models capable of intrusion detection, the applied methodologies were the CRISP-DM process model designed to help data scientists with the planning, creation and integration of machine learning models into a business information infrastructure, and design science research interested in answering research questions with information technology artefacts. The two methodologies have a lot in common, which is further elaborated in the study. The goals of this dissertation were two-fold: first, to create an intrusion detector that could provide a high level of intrusion detection performance measured using accuracy and recall and second, to identify potential techniques that can increase intrusion detection performance. Out of the designed models, a hybrid autoencoder + stacking neural network model managed to achieve detection performance comparable to the best models that appeared in the related literature, with good detections on minority classes. To achieve this result, the techniques identified were synthetic sampling, advanced hyperparameter optimization, model ensembles and autoencoder networks. In addition, the dissertation set up a soft hierarchy among the different detection techniques in terms of performance and provides a brief outlook on potential future practical applications of network intrusion detection models as well

    Analysis of Theoretical and Applied Machine Learning Models for Network Intrusion Detection

    Get PDF
    Network Intrusion Detection System (IDS) devices play a crucial role in the realm of network security. These systems generate alerts for security analysts by performing signature-based and anomaly-based detection on malicious network traffic. However, there are several challenges when configuring and fine-tuning these IDS devices for high accuracy and precision. Machine learning utilizes a variety of algorithms and unique dataset input to generate models for effective classification. These machine learning techniques can be applied to IDS devices to classify and filter anomalous network traffic. This combination of machine learning and network security provides improved automated network defense by developing highly-optimized IDS models that utilize unique algorithms for enhanced intrusion detection. Machine learning models can be trained using a combination of machine learning algorithms, network intrusion datasets, and optimization techniques. This study sought to identify which variation of these parameters yielded the best-performing network intrusion detection models, measured by their accuracy, precision, recall, and F1 score metrics. Additionally, this research aimed to validate theoretical models’ metrics by applying them in a real-world environment to see if they perform as expected. This research utilized a quantitative experimental study design to organize a two-phase approach to train and test a series of machine learning models for network intrusion detection by utilizing Python scripting, the scikit-learn library, and Zeek IDS software. The first phase involved optimizing and training 105 machine learning models by testing a combination of seven machine learning algorithms, five network intrusion datasets, and three optimization methods. These 105 models were then fed into the second phase, where the models were applied in a machine learning IDS pipeline to observe how the models performed in an implemented environment. The results of this study identify which algorithms, datasets, and optimization methods generate the best-performing models for network intrusion detection. This research also showcases the need to utilize various algorithms and datasets since no individual algorithm or dataset consistently achieved high metric scores independent of other training variables. Additionally, this research also indicates that optimization during model development is highly recommended; however, there may not be a need to test for multiple optimization methods since they did not typically impact the yielded models’ overall categorization of v success or failure. Lastly, this study’s results strongly indicate that theoretical machine learning models will most likely perform significantly worse when applied in an implemented IDS ML pipeline environment. This study can be utilized by other industry professionals and research academics in the fields of information security and machine learning to generate better highly-optimized models for their work environments or experimental research

    Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches

    Get PDF
    Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR'16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further

    Applying Machine Learning to Cyber Security

    Get PDF
    Intrusion Detection Systems (IDS) nowadays are a very important part of a system. In the last years many methods have been proposed to implement this kind of security measure against cyber attacks, including Machine Learning and Data Mining based. In this work we discuss in details the family of anomaly based IDSs, which are able to detect never seen attacks, paying particular attention to adherence to the FAIR principles. This principles include the Accessibility and the Reusability of software. Moreover, as the purpose of this work is the assessment of what is going on in the state of the art we have selected three approaches, according to their reproducibility and we have compared their performances with a common experimental setting. Lastly real world use case has been analyzed, resulting in the proposal of an usupervised ML model for pre-processing and analyzing web server logs. The proposed solution uses clustering and outlier detection techniques to detect attacks in an unsupervised way

    Network Intrusion Detection System using Deep Learning Technique

    Get PDF
    The rise in the usage of the internet in this recent time had led to tremendous development in computer networks with large volumes of information transported daily. This development has generated lots of security threats and privacy concerns on networks and data. To tackle these issues, several protective measures have been developed including the Intrusion Detection Systems (IDSs). IDS plays a major backbone in network security and provides an extra layer of security to other security defence mechanisms in a network. However, existing IDS built on a signature base such as snort and the likes are unable to detect unknown and novel threats. Anomaly detection-based IDSs that use Machine Learning (ML) approaches are not scalable when enormous data are presented, and during modelling, the runtime increases as the dataset size increases which needs high computational resources to fulfil the runtime requirements. This thesis proposes a Feedforward Deep Neural Network (FFDNN) for an intrusion detection system that performs a binary classification on the popular NSL-Knowledge discovery and data mining (NSL-KDD) dataset. The model was developed from Keras API integrated into TensorFlow in Google's colaboratory software environment. Three variants of FFDNNs were trained using the NSL-KDD dataset and the network architecture consisted of two hidden layers with 64 and 32; 32 and 16; 512 and 256 neurons respectively, and each with the ReLu activation function. The sigmoid activation function for binary classification was used in the output layer and the prediction loss function used was the binary cross-entropy. Regularization was set to a dropout rate of 0.2 and the Adam optimizer was used. The deep neural networks were trained for 16, 20, 20 epochs respectively for batch sizes of 256, 64, and 128. After evaluating the performances of the FFDNNs on the training data, the prediction was made on test data, and accuracies of 89%, 84%, and 87% were achieved. The experiment was also conducted on the same training dataset (NSL-KDD) using the conventional machine learning algorithms (Random Forest; K-nearest neighbor; Logistic regression; Decision tree; and NaĂŻve Bayes) and predictions of each algorithm on the test data gave different performance accuracies of 81%, 76%, 77%, 77%, 77%, respectively. The performance results of the FFDNNs were calculated based on some important metrics (FPR, FAR, F1 Measure, Precision), and these were compared to the conventional ML algorithms and the outcome shows that the deep neural networks performed best due to their dense architecture that made it scalable with the large size of the dataset and also offered a faster run time during training in contrast to the slow run time of the Conventional ML. This implies that when the dataset is large and a faster computation is required, then FFDNN is a better choice for best performance accuracy
    • …
    corecore