5 research outputs found

    A Review on Various Methods of Intrusion Detection System

    Get PDF
    Detection of Intrusion is an essential expertise business segment as well as a dynamic area of study and expansion caused by its requirement. Modern day intrusion detection systems still have these limitations of time sensitivity. The main requirement is to develop a system which is able of handling large volume of network data to detect attacks more accurately and proactively. Research conducted by on the KDDCUP99 dataset resulted in a various set of attributes for each of the four major attack types. Without reducing the number of features, detecting attack patterns within the data is more difficult for rule generation, forecasting, or classification. The goal of this research is to present a new method that Compare results of appropriately categorized and inaccurately categorized as proportions and the features chosen. Data mining is used to clean, classify and examine large amount of network data. Since a large volume of network traffic that requires processing, we use data mining techniques. Different Data Mining techniques such as clustering, classification and association rules are proving to be useful for analyzing network traffic. This paper presents the survey on data mining techniques applied on intrusion detection systems for the effective identification of both known and unknown patterns of attacks, thereby helping the users to develop secure information systems. Keywords: IDS, Data Mining, Machine Learning, Clustering, Classification DOI: 10.7176/CEIS/11-1-02 Publication date: January 31st 2020

    Sistema de detección de intrusiones con mantenimiento asistido de bases de datos de ataques mediante aprendizaje automático

    Get PDF
    Los sistemas de detecci´on de intrusiones (o IDS, del ingl´es Intrusion Detection System) tienen como fin la detecci´on de ataques en redes de comunicaciones. Como tales, constituyen un elemento de inter´es en la provisi´on de seguridad en gesti´on de redes ante la asunci´on de existencia de agujeros de seguridad en los sistemas hardware y software. Por otro lado, existen sistemas de detecci´on de intrusiones de c´odigo abierto basados en reglas, cuya principal desventaja consiste en el esfuerzo t´ecnico de matenimiento de la base de datos de reglas. En este documento se analizan las t´ecnicas m´as utilizadas en sistemas de detecci´on de intrusiones y se reutilizan sistemas de intrusiones basados en reglas para proponer un sistema de detecci´on de intrusiones con mantenimiento asistido de bases de datos de ataques mediante aprendizaje autom´atico

    Abstraction, aggregation and recursion for generating accurate and simple classifiers

    Get PDF
    An important goal of inductive learning is to generate accurate and compact classifiers from data. In a typical inductive learning scenario, instances in a data set are simply represented as ordered tuples of attribute values. In our research, we explore three methodologies to improve the accuracy and compactness of the classifiers: abstraction, aggregation, and recursion;Firstly, abstraction is aimed at the design and analysis of algorithms that generate and deal with taxonomies for the construction of compact and robust classifiers. In many applications of the data-driven knowledge discovery process, taxonomies have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed taxonomies are unavailable. We introduce algorithms for automated construction of taxonomies inductively from both structured (such as UCI Repository) and unstructured (such as text and biological sequences) data. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies (AVT) from data, and Word Taxonomy Learner (WTL), an algorithm for automated construction of word taxonomy from text and sequence data. We describe experiments on the UCI data sets and compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL). Our results show that the AVTs generated by AVT-Learner are compeitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL. Similarly, our experimental results of WTL and WTNBL on protein localization sequences and Reuters newswire text categorization data sets show that the proposed algorithms can generate Naive Bayes classifiers that are more compact and often more accurate than those produced by standard Naive Bayes learner for the Multinomial Model;Secondly, we apply aggregation to construct features as a multiset of values for the intrusion detection task. For this task, we propose a bag of system calls representation for system call traces and describe misuse and anomaly detection results on the University of New Mexico (UNM) and MIT Lincoln Lab (MIT LL) system call sequences with the proposed representation. With the feature representation as input, we compare the performance of several machine learning techniques for misuse detection and show experimental results on anomaly detection. The results show that standard machine learning and clustering techniques using the simple bag of system calls representation based on the system call traces generated by the operating system\u27s kernel is effective and often performs better than approaches that use foreign contiguous sequences in detecting intrusive behaviors of compromised processes;Finally, we construct a set of classifiers by recursive application of the Naive Bayes learning algorithms. Naive Bayes (NB) classifier relies on the assumption that the instances in each class can be described by a single generative model. This assumption can be restrictive in many real world classification tasks. We describe recursive Naive Bayes learner (RNBL), which relaxes this assumption by constructing a tree of Naive Bayes classifiers for sequence classification, where each individual NB classifier in the tree is based on an event model (one model for each class at each node in the tree). In our experiments on protein sequences, Reuters newswire documents and UC-Irvine benchmark data sets, we observe that RNBL substantially outperforms NB classifier. Furthermore, our experiments on the protein sequences and the text documents show that RNBL outperforms C4.5 decision tree learner (using tests on sequence composition statistics as the splitting criterion) and yields accuracies that are comparable to those of support vector machines (SVM) using similar information

    Diagnostic de systèmes complexes par comparaison de listes d’alarmes : application aux systèmes de contrôle du LHC

    Get PDF
    In the context of the CERN Large Hadron Collider (LHC), a large number of control systems have been built based on industrial control and SCADA solutions. Beyond the complexity of these systems, a large number of sensors and actuators are controlled which make the monitoring and diagnostic of these equipment a continuous and real challenge for human operators. Even with the existing SCADA monitoring tools, critical situations prompt alarms avalanches in the supervision that makes diagnostic more difficult. This thesis proposes a decision support methodology based on the use of historical data. Past faults signatures represented by alarm lists are compared with the alarm list of the fault to diagnose using pattern matching methods. Two approaches are considered. In the first one, the order of appearance is not taken into account, the alarm lists are then represented by a binary vector and compared to each other thanks to an original weighted distance. Every alarm is weighted according to its ability to represent correctly every past faults. The second approach takes into account the alarms order and uses a symbolic sequence to represent the faults. The comparison between the sequences is then made by an adapted version of the Needleman and Wunsch algorithm widely used in Bio-Informatic. The two methods are tested on artificial data and on simulated data extracted from a very realistic simulator of one of the CERN system. Both methods show good results.Au CERN (Organisation européenne pour la recherche nucléaire), le contrôle et la supervision du plus grand accélérateur du monde, le LHC (Large Hadron Collider), sont basés sur des solutions industrielles (SCADA). Le LHC est composé de sous-systèmes disposant d’un grand nombre de capteurs et d’actionneurs qui rendent la surveillance de ces équipements un véritable défi pour les opérateurs. Même avec les solutions SCADA actuelles, l’occurrence d’un défaut déclenche de véritables avalanches d’alarmes, rendant le diagnostic de ces systèmes très difficile. Cette thèse propose une méthodologie d’aide au diagnostic à partir de données historiques du système. Les signatures des défauts déjà rencontrés et représentés par les listes d’alarmes qu’ils ont déclenchés sont comparées à la liste d’alarmes du défaut à diagnostiquer. Deux approches sont considérées. Dans la première, l’ordre d’apparition des alarmes n’est pas pris en compte et les listes d’alarmes sont représentées par un vecteur binaire. La comparaison se fait à l’aide d’une distance pondérée. Le poids de chaque alarme est évalué en fonction de son aptitude à caractériser chaque défaut. La seconde approche prend en compte l’ordre d’apparition des alarmes, les listes d’alarmes sont alors représentées sous forme de séquences symboliques. La comparaison entre ces deux séquences se fait à l’aide d’un algorithme dérivé de l’algorithme de Needleman et Wunsch utilisé dans le domaine de la Bio-Informatique. Les deux approches sont testées sur des données artificielles ainsi que sur des données extraites d’un simulateur très réaliste d’un des systèmes du LHC et montrent de bons résultats

    Towards the design of efficient error detection mechanisms

    Get PDF
    The pervasive nature of modern computer systems has led to an increase in our reliance on such systems to provide correct and timely services. Moreover, as the functionality of computer systems is being increasingly defined in software, it is imperative that software be dependable. It has previously been shown that a fault intolerant software system can be made fault tolerant through the design and deployment of software mechanisms implementing abstract artefacts known as error detection mechanisms (EDMs) and error recovery mechanisms (ERMs), hence the design of these components is central to the design of dependable software systems. The EDM design problem, which relates to the construction of a boolean predicate over a set of program variables, is inherently difficult, with current approaches relying on system specifications and the experience of software engineers. As this process necessarily entails the identification and incorporation of program variables by an error detection predicate, this thesis seeks to address the EDM design problem from a novel variable-centric perspective, with the research presented supporting the thesis that, where it exists under the assumed system model, an efficient EDM consists of a set of critical variables. In particular, this research proposes (i) a metric suite that can be used to generate a relative ranking of the program variables in a software with respect to their criticality, (ii) a systematic approach for the generation of highly-efficient error detection predicates for EDMs, and (iii) an approach for dependability enhancement based on the protection of critical variables using software wrappers that implement error detection and correction predicates that are known to be efficient. This research substantiates the thesis that an efficient EDM contains a set of critical variables on the basis that (i) the proposed metric suite is able, through application of an appropriate threshold, to identify critical variables, (ii) efficient EDMs can be constructed based only on the critical variables identified by the metric suite, and (iii) the criticality of the identified variables can be shown to extend across a software module such that an efficient EDM designed for that software module should seek to determine the correctness of the identified variables
    corecore