1,500 research outputs found

    Are Intrusion Detection Studies Evaluated Consistently? A Systematic Literature Review

    Get PDF
    Cyberinfrastructure is increasingly becoming target of a wide spectrum of attacks from Denial of Service to large-scale defacement of the digital presence of an organization. Intrusion Detection System (IDSs) provide administrators a defensive edge over intruders lodging such malicious attacks. However, with the sheer number of different IDSs available, one has to objectively assess the capabilities of different IDSs to select an IDS that meets specific organizational requirements. A prerequisite to enable such an objective assessment is the implicit comparability of IDS literature. In this study, we review IDS literature to understand the implicit comparability of IDS literature from the perspective of metrics used in the empirical evaluation of the IDS. We identified 22 metrics commonly used in the empirical evaluation of IDS and constructed search terms to retrieve papers that mention the metric. We manually reviewed a sample of 495 papers and found 159 of them to be relevant. We then estimated the number of relevant papers in the entire set of papers retrieved from IEEE. We found that, in the evaluation of IDSs, multiple different metrics are used and the trade-off between metrics is rarely considered. In a retrospective analysis of the IDS literature, we found the the evaluation criteria has been improving over time, albeit marginally. The inconsistencies in the use of evaluation metrics may not enable direct comparison of one IDS to another

    Tools and algorithms to advance interactive intrusion analysis via Machine Learning and Information Retrieval

    Get PDF
    We consider typical tasks that arise in the intrusion analysis of log data from the perspectives of Machine Learning and Information Retrieval, and we study a number of data organization and interactive learning techniques to improve the analyst\u27s efficiency. In doing so, we attempt to translate intrusion analysis problems into the language of the abovementioned disciplines and to offer metrics to evaluate the effect of proposed techniques. The Kerf toolkit contains prototype implementations of these techniques, as well as data transformation tools that help bridge the gap between the real world log data formats and the ML and IR data models. We also describe the log representation approach that Kerf prototype tools are based on. In particular, we describe the connection between decision trees, automatic classification algorithms and log analysis techniques implemented in Kerf

    Author Matching Classification with Anomaly Detection Approach for Bibliomethric Repository Data

    Get PDF
    Authors name disambiguation (AND) is a complex problem in the process of identifying an author in a digital library (DL). The AND data classification process is very much determined by the grouping process and data processing techniques before entering the classifier algorithm. In general, the data pre-processing technique used is pairwise and similarity to do author matching. In a large enough data set scale, the pairwise technique used in this study is to do a combination of each attribute in the AND dataset and by defining a binary class for each author matching combination, where the unequal author is given a value of 0 and the same author is given a value of 1. The technique produces very high imbalance data where class 0 becomes 98.9% of the amount of data compared to 1.1% of class 1. The results bring up an analysis in which class 1 can be considered and processed as data anomaly of the whole data. Therefore, anomaly detection is the method chosen in this study using the Isolation Forest algorithm as its classifier. The results obtained are very satisfying in terms of accuracy which can reach 99.5%

    Shallow and deep networks intrusion detection system : a taxonomy and survey

    Get PDF
    Intrusion detection has attracted a considerable interest from researchers and industries. The community, after many years of research, still faces the problem of building reliable and efficient IDS that are capable of handling large quantities of data, with changing patterns in real time situations. The work presented in this manuscript classifies intrusion detection systems (IDS). Moreover, a taxonomy and survey of shallow and deep networks intrusion detection systems is presented based on previous and current works. This taxonomy and survey reviews machine learning techniques and their performance in detecting anomalies. Feature selection which influences the effectiveness of machine learning (ML) IDS is discussed to explain the role of feature selection in the classification and training phase of ML IDS. Finally, a discussion of the false and true positive alarm rates is presented to help researchers model reliable and efficient machine learning based intrusion detection systems

    Abstraction, aggregation and recursion for generating accurate and simple classifiers

    Get PDF
    An important goal of inductive learning is to generate accurate and compact classifiers from data. In a typical inductive learning scenario, instances in a data set are simply represented as ordered tuples of attribute values. In our research, we explore three methodologies to improve the accuracy and compactness of the classifiers: abstraction, aggregation, and recursion;Firstly, abstraction is aimed at the design and analysis of algorithms that generate and deal with taxonomies for the construction of compact and robust classifiers. In many applications of the data-driven knowledge discovery process, taxonomies have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed taxonomies are unavailable. We introduce algorithms for automated construction of taxonomies inductively from both structured (such as UCI Repository) and unstructured (such as text and biological sequences) data. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies (AVT) from data, and Word Taxonomy Learner (WTL), an algorithm for automated construction of word taxonomy from text and sequence data. We describe experiments on the UCI data sets and compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL). Our results show that the AVTs generated by AVT-Learner are compeitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL. Similarly, our experimental results of WTL and WTNBL on protein localization sequences and Reuters newswire text categorization data sets show that the proposed algorithms can generate Naive Bayes classifiers that are more compact and often more accurate than those produced by standard Naive Bayes learner for the Multinomial Model;Secondly, we apply aggregation to construct features as a multiset of values for the intrusion detection task. For this task, we propose a bag of system calls representation for system call traces and describe misuse and anomaly detection results on the University of New Mexico (UNM) and MIT Lincoln Lab (MIT LL) system call sequences with the proposed representation. With the feature representation as input, we compare the performance of several machine learning techniques for misuse detection and show experimental results on anomaly detection. The results show that standard machine learning and clustering techniques using the simple bag of system calls representation based on the system call traces generated by the operating system\u27s kernel is effective and often performs better than approaches that use foreign contiguous sequences in detecting intrusive behaviors of compromised processes;Finally, we construct a set of classifiers by recursive application of the Naive Bayes learning algorithms. Naive Bayes (NB) classifier relies on the assumption that the instances in each class can be described by a single generative model. This assumption can be restrictive in many real world classification tasks. We describe recursive Naive Bayes learner (RNBL), which relaxes this assumption by constructing a tree of Naive Bayes classifiers for sequence classification, where each individual NB classifier in the tree is based on an event model (one model for each class at each node in the tree). In our experiments on protein sequences, Reuters newswire documents and UC-Irvine benchmark data sets, we observe that RNBL substantially outperforms NB classifier. Furthermore, our experiments on the protein sequences and the text documents show that RNBL outperforms C4.5 decision tree learner (using tests on sequence composition statistics as the splitting criterion) and yields accuracies that are comparable to those of support vector machines (SVM) using similar information

    Forging a deep learning neural network intrusion detection framework to curb the distributed denial of service attack

    Get PDF
    Today’s popularity of the internet has since proven an effective and efficient means of information sharing. However, this has consequently advanced the proliferation of adversaries who aim at unauthorized access to information being shared over the internet medium. These are achieved via various means one of which is the distributed denial of service attacks-which has become a major threat to the electronic society. These are carefully crafted attacks of large magnitude that possess the capability to wreak havoc at very high levels and national infrastructures. This study posits intelligent systems via the use of machine learning frameworks to detect such. We employ the deep learning approach to distinguish between benign exchange of data and malicious attacks from data traffic. Results shows consequent success in the employment of deep learning neural network to effectively differentiate between acceptable and non-acceptable data packets (intrusion) on a network data traffic

    OSTINATO: Cross-host Attack Correlation Through Attack Activity Similarity Detection

    Full text link
    Modern attacks against enterprises often have multiple targets inside the enterprise network. Due to the large size of these networks and increasingly stealthy attacks, attacker activities spanning multiple hosts are extremely difficult to correlate during a threat-hunting effort. In this paper, we present a method for an efficient cross-host attack correlation across multiple hosts. Unlike previous works, our approach does not require lateral movement detection techniques or host-level modifications. Instead, our approach relies on an observation that attackers have a few strategic mission objectives on every host that they infiltrate, and there exist only a handful of techniques for achieving those objectives. The central idea behind our approach involves comparing (OS agnostic) activities on different hosts and correlating the hosts that display the use of similar tactics, techniques, and procedures. We implement our approach in a tool called Ostinato and successfully evaluate it in threat hunting scenarios involving DARPA-led red team engagements spanning 500 hosts and in another multi-host attack scenario. Ostinato successfully detected 21 additional compromised hosts, which the underlying host-based detection system overlooked in activities spanning multiple days of the attack campaign. Additionally, Ostinato successfully reduced alarms generated from the underlying detection system by more than 90%, thus helping to mitigate the threat alert fatigue problemComment: 21 pages, 5 figure

    Exploring Lightweight Deep Learning Solution for Malware Detection in IoT Constraint Environment

    Get PDF
    : The present era is facing the industrial revolution. Machine-to-Machine (M2M) communication paradigm is becoming prevalent. Resultantly, the computational capabilities are being embedded in everyday objects called things. When connected to the internet, these things create an Internet of Things (IoT). However, the things are resource-constrained devices that have limited computational power. The connectivity of the things with the internet raises the challenges of the security. The user sensitive information processed by the things is also susceptible to the trusability issues. Therefore, the proliferation of cybersecurity risks and malware threat increases the need for enhanced security integration. This demands augmenting the things with state-of-the-art deep learning models for enhanced detection and protection of the user data. Existingly, the deep learning solutions are overly complex, and often overfitted for the given problem. In this research, our primary objective is to investigate a lightweight deep-learning approach maximizes the accuracy scores with lower computational costs to ensure the applicability of real-time malware monitoring in constrained IoT devices. We used state-of-the-art Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Bi-directional LSTM deep learning algorithm on a vanilla configuration trained on a standard malware dataset. The results of the proposed approach show that the simple deep neural models having single dense layer and a few hundred trainable parameters can eliminate the model overfitting and achieve up to 99.45% accuracy, outperforming the overly complex deep learning models.publishedVersio
    corecore