518 research outputs found

    Feature Grouping-based Feature Selection

    Get PDF

    Performance Evaluation of Network Anomaly Detection Systems

    Get PDF
    Nowadays, there is a huge and growing concern about security in information and communication technology (ICT) among the scientific community because any attack or anomaly in the network can greatly affect many domains such as national security, private data storage, social welfare, economic issues, and so on. Therefore, the anomaly detection domain is a broad research area, and many different techniques and approaches for this purpose have emerged through the years. Attacks, problems, and internal failures when not detected early may badly harm an entire Network system. Thus, this thesis presents an autonomous profile-based anomaly detection system based on the statistical method Principal Component Analysis (PCADS-AD). This approach creates a network profile called Digital Signature of Network Segment using Flow Analysis (DSNSF) that denotes the predicted normal behavior of a network traffic activity through historical data analysis. That digital signature is used as a threshold for volume anomaly detection to detect disparities in the normal traffic trend. The proposed system uses seven traffic flow attributes: Bits, Packets and Number of Flows to detect problems, and Source and Destination IP addresses and Ports, to provides the network administrator necessary information to solve them. Via evaluation techniques, addition of a different anomaly detection approach, and comparisons to other methods performed in this thesis using real network traffic data, results showed good traffic prediction by the DSNSF and encouraging false alarm generation and detection accuracy on the detection schema. The observed results seek to contribute to the advance of the state of the art in methods and strategies for anomaly detection that aim to surpass some challenges that emerge from the constant growth in complexity, speed and size of today’s large scale networks, also providing high-value results for a better detection in real time.Atualmente, existe uma enorme e crescente preocupação com segurança em tecnologia da informação e comunicação (TIC) entre a comunidade cientĂ­fica. Isto porque qualquer ataque ou anomalia na rede pode afetar a qualidade, interoperabilidade, disponibilidade, e integridade em muitos domĂ­nios, como segurança nacional, armazenamento de dados privados, bem-estar social, questĂ”es econĂŽmicas, e assim por diante. Portanto, a deteção de anomalias Ă© uma ampla ĂĄrea de pesquisa, e muitas tĂ©cnicas e abordagens diferentes para esse propĂłsito surgiram ao longo dos anos. Ataques, problemas e falhas internas quando nĂŁo detetados precocemente podem prejudicar gravemente todo um sistema de rede. Assim, esta Tese apresenta um sistema autĂŽnomo de deteção de anomalias baseado em perfil utilizando o mĂ©todo estatĂ­stico AnĂĄlise de Componentes Principais (PCADS-AD). Essa abordagem cria um perfil de rede chamado Assinatura Digital do Segmento de Rede usando AnĂĄlise de Fluxos (DSNSF) que denota o comportamento normal previsto de uma atividade de trĂĄfego de rede por meio da anĂĄlise de dados histĂłricos. Essa assinatura digital Ă© utilizada como um limiar para deteção de anomalia de volume e identificar disparidades na tendĂȘncia de trĂĄfego normal. O sistema proposto utiliza sete atributos de fluxo de trĂĄfego: bits, pacotes e nĂșmero de fluxos para detetar problemas, alĂ©m de endereços IP e portas de origem e destino para fornecer ao administrador de rede as informaçÔes necessĂĄrias para resolvĂȘ-los. Por meio da utilização de mĂ©tricas de avaliação, do acrescimento de uma abordagem de deteção distinta da proposta principal e comparaçÔes com outros mĂ©todos realizados nesta tese usando dados reais de trĂĄfego de rede, os resultados mostraram boas previsĂ”es de trĂĄfego pelo DSNSF e resultados encorajadores quanto a geração de alarmes falsos e precisĂŁo de deteção. Com os resultados observados nesta tese, este trabalho de doutoramento busca contribuir para o avanço do estado da arte em mĂ©todos e estratĂ©gias de deteção de anomalias, visando superar alguns desafios que emergem do constante crescimento em complexidade, velocidade e tamanho das redes de grande porte da atualidade, proporcionando tambĂ©m alta performance. Ainda, a baixa complexidade e agilidade do sistema proposto contribuem para que possa ser aplicado a deteção em tempo real

    Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

    Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods

    Get PDF
    Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques. The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns. The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other. The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques. The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy

    Performance Metrics for Network Intrusion Systems

    Get PDF
    Intrusion systems have been the subject of considerable research during the past 33 years, since the original work of Anderson. Much has been published attempting to improve their performance using advanced data processing techniques including neural nets, statistical pattern recognition and genetic algorithms. Whilst some significant improvements have been achieved they are often the result of assumptions that are difficult to justify and comparing performance between different research groups is difficult. The thesis develops a new approach to defining performance focussed on comparing intrusion systems and technologies. A new taxonomy is proposed in which the type of output and the data scale over which an intrusion system operates is used for classification. The inconsistencies and inadequacies of existing definitions of detection are examined and five new intrusion levels are proposed from analogy with other detection-based technologies. These levels are known as detection, recognition, identification, confirmation and prosecution, each representing an increase in the information output from, and functionality of, the intrusion system. These levels are contrasted over four physical data scales, from application/host through to enterprise networks, introducing and developing the concept of a footprint as a pictorial representation of the scope of an intrusion system. An intrusion is now defined as “an activity that leads to the violation of the security policy of a computer system”. Five different intrusion technologies are illustrated using the footprint with current challenges also shown to stimulate further research. Integrity in the presence of mixed trust data streams at the highest intrusion level is identified as particularly challenging. Two metrics new to intrusion systems are defined to quantify performance and further aid comparison. Sensitivity is introduced to define basic detectability of an attack in terms of a single parameter, rather than the usual four currently in use. Selectivity is used to describe the ability of an intrusion system to discriminate between attack types. These metrics are quantified experimentally for network intrusion using the DARPA 1999 dataset and SNORT. Only nine of the 58 attack types present were detected with sensitivities in excess of 12dB indicating that detection performance of the attack types present in this dataset remains a challenge. The measured selectivity was also poor indicting that only three of the attack types could be confidently distinguished. The highest value of selectivity was 3.52, significantly lower than the theoretical limit of 5.83 for the evaluated system. Options for improving selectivity and sensitivity through additional measurements are examined.Stochastic Systems Lt

    Who wrote this scientific text?

    No full text
    The IEEE bibliographic database contains a number of proven duplications with indication of the original paper(s) copied. This corpus is used to test a method for the detection of hidden intertextuality (commonly named "plagiarism"). The intertextual distance, combined with the sliding window and with various classification techniques, identifies these duplications with a very low risk of error. These experiments also show that several factors blur the identity of the scientific author, including variable group authorship and the high levels of intertextuality accepted, and sometimes desired, in scientific papers on the same topic

    Fault Detection and Isolation of Wind Turbines using Immune System Inspired Algorithms

    Get PDF
    Recently, the research focus on renewable sources of energy has been growing intensively. This is mainly due to potential depletion of fossil fuels and its associated environmental concerns, such as pollution and greenhouse gas emissions. Wind energy is one of the fastest growing sources of renewable energy, and policy makers in both developing and developed countries have built their vision on future energy supply based on and by emphasizing the wind power. The increase in the number of wind turbines, as well as their size, have led to undeniable care and attention to health and condition monitoring as well as fault diagnosis of wind turbine systems and their components. In this thesis, two main immune inspired algorithms are used to perform Fault Detection and Isolation (FDI) of a Wind Turbine (WT), namely the Negative Selection Algorithm (NSA) as well as the Dendritic Cell Algorithm (DCA). First, an NSA-based fault diagnosis methodology is proposed in which a hierarchical bank of NSAs is used to detect and isolate both individual as well as simultaneously occurring faults common to the wind turbines. A smoothing moving window filter is then utilized to further improve the reliability and performance of the proposed FDI scheme. Moreover, the performance of the proposed scheme is compared with the state-of-the-art data-driven technique, namely Support Vector Machine (SVM) to demonstrate and illustrate the superiority and advantages of the proposed NSA-based FDI scheme. Finally, a nonparametric statistical comparison test is implemented to evaluate the proposed methodology with that of the SVM under various fault severities. In the second part, another immune inspired methodology, namely the Dendritic Cell Algorithm (DCA) is used to perform online sensor fault FDI. A noise filter is also designed to attenuate the measurement noise, resulting in better FDI results. The proposed DCA-based FDI scheme is then compared with the previously developed NSA-based FDI scheme, and a nonparametric statistical comparison test is also performed. Both of the proposed immune inspired frameworks are applied to a well-known wind turbine benchmark model in order to validate the effectiveness of the proposed methodologies

    Holistic Network Defense: Fusing Host and Network Features for Attack Classification

    Get PDF
    This work presents a hybrid network-host monitoring strategy, which fuses data from both the network and the host to recognize malware infections. This work focuses on three categories: Normal, Scanning, and Infected. The network-host sensor fusion is accomplished by extracting 248 features from network traffic using the Fullstats Network Feature generator and from the host using text mining, looking at the frequency of the 500 most common strings and analyzing them as word vectors. Improvements to detection performance are made by synergistically fusing network features obtained from IP packet flows and host features, obtained from text mining port, processor, logon information among others. In addition, the work compares three different machine learning algorithms and updates the script required to obtain network features. Hybrid method results outperformed host only classification by 31.7% and network only classification by 25%. The new approach also reduces the number of alerts while remaining accurate compared with the commercial IDS SNORT. These results make it such that even the most typical users could understand alert classification messages

    Intrusion Detection from Heterogenous Sensors

    Get PDF
    RÉSUMÉ De nos jours, la protection des systĂšmes et rĂ©seaux informatiques contre diffĂ©rentes attaques avancĂ©es et distribuĂ©es constitue un dĂ©fi vital pour leurs propriĂ©taires. L’une des menaces critiques Ă  la sĂ©curitĂ© de ces infrastructures informatiques sont les attaques rĂ©alisĂ©es par des individus dont les intentions sont malveillantes, qu’ils soient situĂ©s Ă  l’intĂ©rieur et Ă  l’extĂ©rieur de l’environnement du systĂšme, afin d’abuser des services disponibles, ou de rĂ©vĂ©ler des informations confidentielles. Par consĂ©quent, la gestion et la surveillance des systĂšmes informatiques est un dĂ©fi considĂ©rable considĂ©rant que de nouvelles menaces et attaques sont dĂ©couvertes sur une base quotidienne. Les systĂšmes de dĂ©tection d’intrusion, Intrusion Detection Systems (IDS) en anglais, jouent un rĂŽle clĂ© dans la surveillance et le contrĂŽle des infrastructures de rĂ©seau informatique. Ces systĂšmes inspectent les Ă©vĂ©nements qui se produisent dans les systĂšmes et rĂ©seaux informatiques et en cas de dĂ©tection d’activitĂ© malveillante, ces derniers gĂ©nĂšrent des alertes afin de fournir les dĂ©tails des attaques survenues. Cependant, ces systĂšmes prĂ©sentent certaines limitations qui mĂ©ritent d’ĂȘtre adressĂ©es si nous souhaitons les rendre suffisamment fiables pour rĂ©pondre aux besoins rĂ©els. L’un des principaux dĂ©fis qui caractĂ©rise les IDS est le grand nombre d’alertes redondantes et non pertinentes ainsi que le taux de faux-positif gĂ©nĂ©rĂ©s, faisant de leur analyse une tĂąche difficile pour les administrateurs de sĂ©curitĂ© qui tentent de dĂ©terminer et d’identifier les alertes qui sont rĂ©ellement importantes. Une partie du problĂšme rĂ©side dans le fait que la plupart des IDS ne prennent pas compte les informations contextuelles (type de systĂšmes, applications, utilisateurs, rĂ©seaux, etc.) reliĂ©es Ă  l’attaque. Ainsi, une grande partie des alertes gĂ©nĂ©rĂ©es par les IDS sont non pertinentes en ce sens qu’elles ne permettent de comprendre l’attaque dans son contexte et ce, malgrĂ© le fait que le systĂšme ait rĂ©ussi Ă  correctement dĂ©tecter une intrusion. De plus, plusieurs IDS limitent leur dĂ©tection Ă  un seul type de capteur, ce qui les rend inefficaces pour dĂ©tecter de nouvelles attaques complexes. Or, ceci est particuliĂšrement important dans le cas des attaques ciblĂ©es qui tentent d’éviter la dĂ©tection par IDS conventionnels et par d’autres produits de sĂ©curitĂ©. Bien que de nombreux administrateurs systĂšme incorporent avec succĂšs des informations de contexte ainsi que diffĂ©rents types de capteurs et journaux dans leurs analyses, un problĂšme important avec cette approche reste le manque d’automatisation, tant au niveau du stockage que de l’analyse. Afin de rĂ©soudre ces problĂšmes d’applicabilitĂ©, divers types d’IDS ont Ă©tĂ© proposĂ©s dans les derniĂšres annĂ©es, dont les IDS de type composant pris sur Ă©tagĂšre, commercial off-the-shelf (COTS) en anglais, qui sont maintenant largement utilisĂ©s dans les centres d’opĂ©rations de sĂ©curitĂ©, Security Operations Center (SOC) en anglais, de plusieurs grandes organisations. D’un point de vue plus gĂ©nĂ©ral, les diffĂ©rentes approches proposĂ©es peuvent ĂȘtre classĂ©es en diffĂ©rentes catĂ©gories : les mĂ©thodes basĂ©es sur l’apprentissage machine, tel que les rĂ©seaux bayĂ©siens, les mĂ©thodes d’extraction de donnĂ©es, les arbres de dĂ©cision, les rĂ©seaux de neurones, etc., les mĂ©thodes impliquant la corrĂ©lation d’alertes et les approches fondĂ©es sur la fusion d’alertes, les systĂšmes de dĂ©tection d’intrusion sensibles au contexte, les IDS dit distribuĂ©s et les IDS qui reposent sur la notion d’ontologie de base. Étant donnĂ© que ces diffĂ©rentes approches se concentrent uniquement sur un ou quelques-uns des dĂ©fis courants reliĂ©s aux IDS, au meilleure de notre connaissance, le problĂšme dans son ensemble n’a pas Ă©tĂ© rĂ©solu. Par consĂ©quent, il n’existe aucune approche permettant de couvrir tous les dĂ©fis des IDS modernes prĂ©cĂ©demment mentionnĂ©s. Par exemple, les systĂšmes qui reposent sur des mĂ©thodes d’apprentissage machine classent les Ă©vĂ©nements sur la base de certaines caractĂ©ristiques en fonction du comportement observĂ© pour un type d’évĂ©nements, mais ils ne prennent pas en compte les informations reliĂ©es au contexte et les relations pouvant exister entre plusieurs Ă©vĂ©nements. La plupart des techniques de corrĂ©lation d’alerte proposĂ©es ne considĂšrent que la corrĂ©lation entre plusieurs capteurs du mĂȘme type ayant un Ă©vĂ©nement commun et une sĂ©mantique d’alerte similaire (corrĂ©lation homogĂšne), laissant aux administrateurs de sĂ©curitĂ© la tĂąche d’effectuer la corrĂ©lation entre les diffĂ©rents types de capteurs hĂ©tĂ©rogĂšnes. Pour leur part, les approches sensibles au contexte n’emploient que des aspects limitĂ©s du contexte sous-jacent. Une autre limitation majeure des diffĂ©rentes approches proposĂ©es est l’absence d’évaluation prĂ©cise basĂ©e sur des ensembles de donnĂ©es qui contiennent des scĂ©narios d’attaque complexes et modernes. À cet effet, l’objectif de cette thĂšse est de concevoir un systĂšme de corrĂ©lation d’évĂ©nements qui peut prendre en considĂ©ration plusieurs types hĂ©tĂ©rogĂšnes de capteurs ainsi que les journaux de plusieurs applications (par exemple, IDS/IPS, pare-feu, base de donnĂ©es, systĂšme d’exploitation, antivirus, proxy web, routeurs, etc.). Cette mĂ©thode permettra de dĂ©tecter des attaques complexes qui laissent des traces dans les diffĂ©rents systĂšmes, et d’incorporer les informations de contexte dans l’analyse afin de rĂ©duire les faux-positifs. Nos contributions peuvent ĂȘtre divisĂ©es en quatre parties principales : 1) Nous proposons la Pasargadae, une solution complĂšte sensible au contexte et reposant sur une ontologie de corrĂ©lation des Ă©vĂ©nements, laquelle effectue automatiquement la corrĂ©lation des Ă©vĂ©nements par l’analyse des informations recueillies auprĂšs de diverses sources. Pasargadae utilise le concept d’ontologie pour reprĂ©senter et stocker des informations sur les Ă©vĂ©nements, le contexte et les vulnĂ©rabilitĂ©s, les scĂ©narios d’attaques, et utilise des rĂšgles d’ontologie de logique simple Ă©crites en Semantic Query-Enhance Web Rule Language (SQWRL) afin de corrĂ©ler diverse informations et de filtrer les alertes non pertinentes, en double, et les faux-positifs. 2) Nous proposons une approche basĂ©e sur, mĂ©ta-Ă©vĂ©nement , tri topologique et l‘approche corrĂ©lation dâ€˜Ă©vĂ©nement basĂ©e sur sĂ©mantique qui emploie Pasargadae pour effectuer la corrĂ©lation d’évĂ©nements Ă  travers les Ă©vĂ©nements collectĂ©s de plusieurs capteurs rĂ©partis dans un rĂ©seau informatique. 3) Nous proposons une approche alerte de fusion basĂ©e sur sĂ©mantique, contexte sensible, qui s‘appuie sur certains des sous-composantes de Pasargadae pour effectuer une alerte fusion hĂ©tĂ©rogĂšne recueillies auprĂšs IDS hĂ©tĂ©rogĂšnes. 4) Dans le but de montrer le niveau de flexibilitĂ© de Pasargadae, nous l’utilisons pour mettre en oeuvre d’autres approches proposĂ©es d‘alertes et de corrĂ©lation dâ€˜Ă©vĂ©nements. La somme de ces contributions reprĂ©sente une amĂ©lioration significative de l’applicabilitĂ© et la fiabilitĂ© des IDS dans des situations du monde rĂ©el. Afin de tester la performance et la flexibilitĂ© de l’approche de corrĂ©lation d’évĂ©nements proposĂ©s, nous devons aborder le manque d’infrastructures expĂ©rimental adĂ©quat pour la sĂ©curitĂ© du rĂ©seau. Une Ă©tude de littĂ©rature montre que les approches expĂ©rimentales actuelles ne sont pas adaptĂ©es pour gĂ©nĂ©rer des donnĂ©es de rĂ©seau de grande fidĂ©litĂ©. Par consĂ©quent, afin d’accomplir une Ă©valuation complĂšte, d’abord, nous menons nos expĂ©riences sur deux scĂ©narios d’étude d‘analyse de cas distincts, inspirĂ©s des ensembles de donnĂ©es d’évaluation DARPA 2000 et UNB ISCX IDS. Ensuite, comme une Ă©tude dĂ©posĂ©e complĂšte, nous employons Pasargadae dans un vrai rĂ©seau informatique pour une pĂ©riode de deux semaines pour inspecter ses capacitĂ©s de dĂ©tection sur un vrai terrain trafic de rĂ©seau. Les rĂ©sultats obtenus montrent que, par rapport Ă  d’autres amĂ©liorations IDS existants, les contributions proposĂ©es amĂ©liorent considĂ©rablement les performances IDS (taux de dĂ©tection) tout en rĂ©duisant les faux positifs, non pertinents et alertes en double.----------ABSTRACT Nowadays, protecting computer systems and networks against various distributed and multi-steps attack has been a vital challenge for their owners. One of the essential threats to the security of such computer infrastructures is attacks by malicious individuals from inside and outside of the system environment to abuse available services, or reveal their confidential information. Consequently, managing and supervising computer systems is a considerable challenge, as new threats and attacks are discovered on a daily basis. Intrusion Detection Systems (IDSs) play a key role in the surveillance and monitoring of computer network infrastructures. These systems inspect events occurred in computer systems and networks and in case of any malicious behavior they generate appropriate alerts describing the attacks’ details. However, there are a number of shortcomings that need to be addressed to make them reliable enough in the real-world situations. One of the fundamental challenges in real-world IDS is the large number of redundant, non-relevant, and false positive alerts that they generate, making it a difficult task for security administrators to determine and identify real and important alerts. Part of the problem is that most of the IDS do not take into account contextual information (type of systems, applications, users, networks, etc.), and therefore a large portion of the alerts are non-relevant in that even though they correctly recognize an intrusion, the intrusion fails to reach its objectives. Additionally, to detect newer and complicated attacks, relying on only one detection sensor type is not adequate, and as a result many of the current IDS are unable to detect them. This is especially important with respect to targeted attacks that try to avoid detection by conventional IDS and by other security products. While many system administrators are known to successfully incorporate context information and many different types of sensors and logs into their analysis, an important problem with this approach is the lack of automation in both storage and analysis. In order to address these problems in IDS applicability, various IDS types have been proposed in the recent years and commercial off-the-shelf (COTS) IDS products have found their way into Security Operations Centers (SOC) of many large organizations. From a general perspective, these works can be categorized into: machine learning based approaches including Bayesian networks, data mining methods, decision trees, neural networks, etc., alert correlation and alert fusion based approaches, context-aware intrusion detection systems, distributed intrusion detection systems, and ontology based intrusion detection systems. To the best of our knowledge, since these works only focus on one or few of the IDS challenges, the problem as a whole has not been resolved. Hence, there is no comprehensive work addressing all the mentioned challenges of modern intrusion detection systems. For example, works that utilize machine learning approaches only classify events based on some features depending on behavior observed with one type of events, and they do not take into account contextual information and event interrelationships. Most of the proposed alert correlation techniques consider correlation only across multiple sensors of the same type having a common event and alert semantics (homogeneous correlation), leaving it to security administrators to perform correlation across heterogeneous types of sensors. Context-aware approaches only employ limited aspects of the underlying context. The lack of accurate evaluation based on the data sets that encompass modern complex attack scenarios is another major shortcoming of most of the proposed approaches. The goal of this thesis is to design an event correlation system that can correlate across several heterogeneous types of sensors and logs (e.g. IDS/IPS, firewall, database, operating system, anti-virus, web proxy, routers, etc.) in order to hope to detect complex attacks that leave traces in various systems, and incorporate context information into the analysis, in order to reduce false positives. To this end, our contributions can be split into 4 main parts: 1) we propose the Pasargadae comprehensive context-aware and ontology-based event correlation framework that automatically performs event correlation by reasoning on the information collected from various information resources. Pasargadae uses ontologies to represent and store information on events, context and vulnerability information, and attack scenarios, and uses simple ontology logic rules written in Semantic Query-Enhance Web Rule Language (SQWRL) to correlate various information and filter out non-relevant alerts and duplicate alerts, and false positives. 2) We propose a meta-event based, topological sort based and semantic-based event correlation approach that employs Pasargadae to perform event correlation across events collected form several sensors distributed in a computer network. 3) We propose a semantic-based context-aware alert fusion approach that relies on some of the subcomponents of Pasargadae to perform heterogeneous alert fusion collected from heterogeneous IDS. 4) In order to show the level of flexibility of Pasargadae, we use it to implement some other proposed alert and event correlation approaches. The sum of these contributions represent a significant improvement in the applicability and reliability of IDS in real-world situations. In order to test the performance and flexibility of the proposed event correlation approach, we need to address the lack of experimental infrastructure suitable for network security. A study of the literature shows that current experimental approaches are not appropriate to generate high fidelity network data. Consequently, in order to accomplish a comprehensive evaluation, first, we conduct our experiments on two separate analysis case study scenarios, inspired from the DARPA 2000 and UNB ISCX IDS evaluation data sets. Next, as a complete field study, we employ Pasargadae in a real computer network for a two weeks period to inspect its detection capabilities on a ground truth network traffic. The results obtained show that compared to other existing IDS improvements, the proposed contributions significantly improve IDS performance (detection rate) while reducing false positives, non-relevant and duplicate alerts

    L'intertextualité dans les publications scientifiques

    No full text
    La base de donnĂ©es bibliographiques de l'IEEE contient un certain nombre de duplications avĂ©rĂ©es avec indication des originaux copiĂ©s. Ce corpus est utilisĂ© pour tester une mĂ©thode d'attribution d'auteur. La combinaison de la distance intertextuelle avec la fenĂȘtre glissante et diverses techniques de classification permet d'identifier ces duplications avec un risque d'erreur trĂšs faible. Cette expĂ©rience montre Ă©galement que plusieurs facteurs brouillent l'identitĂ© de l'auteur scientifique, notamment des collectifs de chercheurs Ă  gĂ©omĂ©trie variable et une forte dose d'intertextualitĂ© acceptĂ©e voire recherchĂ©e
    • 

    corecore