7,482 research outputs found

    Payload-based anomaly detection in HTTP traffic

    Full text link
    University of Technology, Sydney. Faculty of Engineering and Information Technology.Internet provides quality and convenience to human life but at the same time it provides a platform for network hackers and criminals. Intrusion Detection Systems (IDSs) have been proven to be powerful methods for detecting anomalies in the network. Traditional IDSs based on signatures are unable to detect new (zero days) attacks. Anomaly-based systems are alternative to signature based systems. However, present anomaly detection systems suffer from three major setbacks: (a) Large number of false alarms, (b) Very high volume of network traffic due to high data rates (Gbps), and (c) Inefficiency in operation. In this thesis, we address above issues and develop efficient intrusion detection frameworks and models which can be used in detecting a wide variety of attacks including web-based attacks. Our proposed methods are designed to have very few false alarms. We also address Intrusion Detection as a Pattern Recognition problem and discuss all aspects that are important in realizing an anomaly-based IDS. We present three payload-based anomaly detectors, including Geometrical Structure Anomaly Detection (GSAD), Two-Tier Intrusion Detection system using Linear Discriminant Analysis (LDA), and Real-time Payload-based Intrusion Detection System (RePIDS), for intrusion detection. These detectors perform deep-packet analysis and examine payload content using n-gram text categorization and Mahalanobis Distance Map (MDM) techniques. An MDM extracts hidden correlations between the features within each payload and among packet payloads. GSAD generates model of normal network payload as geometrical structure using MDMs in a fully automatic and unsupervised manner. We have implemented the GSAD model in HTTP environment for web-based applications. For efficient operation of IDSs, the detection speed is a key point. Current IDSs examine a large number of data features to detect intrusions and misuse patterns. Hence, for quickly and accurately identifying anomalies of Internet traffic, feature reduction becomes mandatory. We have proposed two models to address this issue, namely two-tier intrusion detection model and RePIDS. Two-tier intrusion detection model uses Linear Discriminant Analysis approach for feature reduction and optimal feature selection. It uses MDM technique to create a model of normal network payload using an extracted feature set. RePIDS uses a 3-tier Iterative Feature Selection Engine (IFSEng) to reduce dimensionality of the raw dataset using Principal Component Analysis (PCA) technique. IFSEng extracts the most significant features from the original feature set and uses mathematical and graphical methods for optimal feature subset selection. Like two-tier intrusion detection model, RePIDS then uses MDM technique to generate a model of normal network payload using extracted features. We test the proposed IDSs on two publicly available datasets of attacks and normal traffic. Experimental results confirm the effectiveness and validation of our proposed solutions in terms of detection rate, false alarm rate and computational complexity

    N-gram Based Text Categorization Method for Improved Data Mining

    Get PDF
    Though naïve Bayes text classifiers are widely used because of its simplicity and effectiveness, the techniques for improving performances of these classifiers have been rarely studied. Naïve Bayes classifiers which are widely used for text classification in machine learning are based on the conditional probability of features belonging to a class, which the features are selected by feature selection methods. However, its performance is often imperfect because it does not model text well, and by inappropriate feature selection and some disadvantages of the Naive Bayes itself. Sentiment Classification or Text Classification is the act of taking a set of labeled text documents, learning a correlation between a document’s contents and its corresponding labels and then predicting the labels of a set of unlabeled test documents as best as possible. Text Classification is also sometimes called Text Categorization. Text classification has many applications in natural language processing tasks such as E-mail filtering, Intrusion detection systems, news filtering, prediction of user preferences, and organization of documents. The Naive Bayes model makes strong assumptions about the data: it assumes that words in a document are independent. This assumption is clearly violated in natural language text: there are various types of dependences between words induced by the syntactic, semantic, pragmatic and conversational structure of a text. Also, the particular form of the probabilistic model makes assumptions about the distribution of words in documents that are violated in practice. We address this problem and show that it can be solved by modeling text data differently using N-Grams. N-gram Based Text Categorization is a simple method based on statistical information about the usage of sequences of words. We conducted an experiment to demonstrate that our simple modification is able to improve the performance of Naive Bayes for text classification significantly. Keywords: Data Mining, Text Classification, Text Categorization, Naïve Bayes, N-Grams

    Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware

    Get PDF
    Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces
    • …
    corecore