37 research outputs found

    Accurately Identifying New QoS Violation Driven by High-Distributed Low-Rate Denial of Service Attacks Based on Multiple Observed Features

    Get PDF
    We propose using multiple observed features of network traffic to identify new high-distributed low-rate quality of services (QoS) violation so that detection accuracy may be further improved. For the multiple observed features, we choose F feature in TCP packet header as a microscopic feature and, P feature and D feature of network traffic as macroscopic features. Based on these features, we establish multistream fused hidden Markov model (MF-HMM) to detect stealthy low-rate denial of service (LDoS) attacks hidden in legitimate network background traffic. In addition, the threshold value is dynamically adjusted by using Kaufman algorithm. Our experiments show that the additive effect of combining multiple features effectively reduces the false-positive rate. The average detection rate of MF-HMM results in a significant 23.39% and 44.64% improvement over typical power spectrum density (PSD) algorithm and nonparametric cumulative sum (CUSUM) algorithm

    Information fusion architectures for security and resource management in cyber physical systems

    Get PDF
    Data acquisition through sensors is very crucial in determining the operability of the observed physical entity. Cyber Physical Systems (CPSs) are an example of distributed systems where sensors embedded into the physical system are used in sensing and data acquisition. CPSs are a collaboration between the physical and the computational cyber components. The control decisions sent back to the actuators on the physical components from the computational cyber components closes the feedback loop of the CPS. Since, this feedback is solely based on the data collected through the embedded sensors, information acquisition from the data plays an extremely vital role in determining the operational stability of the CPS. Data collection process may be hindered by disturbances such as system faults, noise and security attacks. Hence, simple data acquisition techniques will not suffice as accurate system representation cannot be obtained. Therefore, more powerful methods of inferring information from collected data such as Information Fusion have to be used. Information fusion is analogous to the cognitive process used by humans to integrate data continuously from their senses to make inferences about their environment. Data from the sensors is combined using techniques drawn from several disciplines such as Adaptive Filtering, Machine Learning and Pattern Recognition. Decisions made from such combination of data form the crux of information fusion and differentiates it from a flat structured data aggregation. In this dissertation, multi-layered information fusion models are used to develop automated decision making architectures to service security and resource management requirements in Cyber Physical Systems --Abstract, page iv

    Intrusion Detection from Heterogenous Sensors

    Get PDF
    RÉSUMÉ De nos jours, la protection des systèmes et réseaux informatiques contre différentes attaques avancées et distribuées constitue un défi vital pour leurs propriétaires. L’une des menaces critiques à la sécurité de ces infrastructures informatiques sont les attaques réalisées par des individus dont les intentions sont malveillantes, qu’ils soient situés à l’intérieur et à l’extérieur de l’environnement du système, afin d’abuser des services disponibles, ou de révéler des informations confidentielles. Par conséquent, la gestion et la surveillance des systèmes informatiques est un défi considérable considérant que de nouvelles menaces et attaques sont découvertes sur une base quotidienne. Les systèmes de détection d’intrusion, Intrusion Detection Systems (IDS) en anglais, jouent un rôle clé dans la surveillance et le contrôle des infrastructures de réseau informatique. Ces systèmes inspectent les événements qui se produisent dans les systèmes et réseaux informatiques et en cas de détection d’activité malveillante, ces derniers génèrent des alertes afin de fournir les détails des attaques survenues. Cependant, ces systèmes présentent certaines limitations qui méritent d’être adressées si nous souhaitons les rendre suffisamment fiables pour répondre aux besoins réels. L’un des principaux défis qui caractérise les IDS est le grand nombre d’alertes redondantes et non pertinentes ainsi que le taux de faux-positif générés, faisant de leur analyse une tâche difficile pour les administrateurs de sécurité qui tentent de déterminer et d’identifier les alertes qui sont réellement importantes. Une partie du problème réside dans le fait que la plupart des IDS ne prennent pas compte les informations contextuelles (type de systèmes, applications, utilisateurs, réseaux, etc.) reliées à l’attaque. Ainsi, une grande partie des alertes générées par les IDS sont non pertinentes en ce sens qu’elles ne permettent de comprendre l’attaque dans son contexte et ce, malgré le fait que le système ait réussi à correctement détecter une intrusion. De plus, plusieurs IDS limitent leur détection à un seul type de capteur, ce qui les rend inefficaces pour détecter de nouvelles attaques complexes. Or, ceci est particulièrement important dans le cas des attaques ciblées qui tentent d’éviter la détection par IDS conventionnels et par d’autres produits de sécurité. Bien que de nombreux administrateurs système incorporent avec succès des informations de contexte ainsi que différents types de capteurs et journaux dans leurs analyses, un problème important avec cette approche reste le manque d’automatisation, tant au niveau du stockage que de l’analyse. Afin de résoudre ces problèmes d’applicabilité, divers types d’IDS ont été proposés dans les dernières années, dont les IDS de type composant pris sur étagère, commercial off-the-shelf (COTS) en anglais, qui sont maintenant largement utilisés dans les centres d’opérations de sécurité, Security Operations Center (SOC) en anglais, de plusieurs grandes organisations. D’un point de vue plus général, les différentes approches proposées peuvent être classées en différentes catégories : les méthodes basées sur l’apprentissage machine, tel que les réseaux bayésiens, les méthodes d’extraction de données, les arbres de décision, les réseaux de neurones, etc., les méthodes impliquant la corrélation d’alertes et les approches fondées sur la fusion d’alertes, les systèmes de détection d’intrusion sensibles au contexte, les IDS dit distribués et les IDS qui reposent sur la notion d’ontologie de base. Étant donné que ces différentes approches se concentrent uniquement sur un ou quelques-uns des défis courants reliés aux IDS, au meilleure de notre connaissance, le problème dans son ensemble n’a pas été résolu. Par conséquent, il n’existe aucune approche permettant de couvrir tous les défis des IDS modernes précédemment mentionnés. Par exemple, les systèmes qui reposent sur des méthodes d’apprentissage machine classent les événements sur la base de certaines caractéristiques en fonction du comportement observé pour un type d’événements, mais ils ne prennent pas en compte les informations reliées au contexte et les relations pouvant exister entre plusieurs événements. La plupart des techniques de corrélation d’alerte proposées ne considèrent que la corrélation entre plusieurs capteurs du même type ayant un événement commun et une sémantique d’alerte similaire (corrélation homogène), laissant aux administrateurs de sécurité la tâche d’effectuer la corrélation entre les différents types de capteurs hétérogènes. Pour leur part, les approches sensibles au contexte n’emploient que des aspects limités du contexte sous-jacent. Une autre limitation majeure des différentes approches proposées est l’absence d’évaluation précise basée sur des ensembles de données qui contiennent des scénarios d’attaque complexes et modernes. À cet effet, l’objectif de cette thèse est de concevoir un système de corrélation d’événements qui peut prendre en considération plusieurs types hétérogènes de capteurs ainsi que les journaux de plusieurs applications (par exemple, IDS/IPS, pare-feu, base de données, système d’exploitation, antivirus, proxy web, routeurs, etc.). Cette méthode permettra de détecter des attaques complexes qui laissent des traces dans les différents systèmes, et d’incorporer les informations de contexte dans l’analyse afin de réduire les faux-positifs. Nos contributions peuvent être divisées en quatre parties principales : 1) Nous proposons la Pasargadae, une solution complète sensible au contexte et reposant sur une ontologie de corrélation des événements, laquelle effectue automatiquement la corrélation des événements par l’analyse des informations recueillies auprès de diverses sources. Pasargadae utilise le concept d’ontologie pour représenter et stocker des informations sur les événements, le contexte et les vulnérabilités, les scénarios d’attaques, et utilise des règles d’ontologie de logique simple écrites en Semantic Query-Enhance Web Rule Language (SQWRL) afin de corréler diverse informations et de filtrer les alertes non pertinentes, en double, et les faux-positifs. 2) Nous proposons une approche basée sur, méta-événement , tri topologique et l‘approche corrélation d‘événement basée sur sémantique qui emploie Pasargadae pour effectuer la corrélation d’événements à travers les événements collectés de plusieurs capteurs répartis dans un réseau informatique. 3) Nous proposons une approche alerte de fusion basée sur sémantique, contexte sensible, qui s‘appuie sur certains des sous-composantes de Pasargadae pour effectuer une alerte fusion hétérogène recueillies auprès IDS hétérogènes. 4) Dans le but de montrer le niveau de flexibilité de Pasargadae, nous l’utilisons pour mettre en oeuvre d’autres approches proposées d‘alertes et de corrélation d‘événements. La somme de ces contributions représente une amélioration significative de l’applicabilité et la fiabilité des IDS dans des situations du monde réel. Afin de tester la performance et la flexibilité de l’approche de corrélation d’événements proposés, nous devons aborder le manque d’infrastructures expérimental adéquat pour la sécurité du réseau. Une étude de littérature montre que les approches expérimentales actuelles ne sont pas adaptées pour générer des données de réseau de grande fidélité. Par conséquent, afin d’accomplir une évaluation complète, d’abord, nous menons nos expériences sur deux scénarios d’étude d‘analyse de cas distincts, inspirés des ensembles de données d’évaluation DARPA 2000 et UNB ISCX IDS. Ensuite, comme une étude déposée complète, nous employons Pasargadae dans un vrai réseau informatique pour une période de deux semaines pour inspecter ses capacités de détection sur un vrai terrain trafic de réseau. Les résultats obtenus montrent que, par rapport à d’autres améliorations IDS existants, les contributions proposées améliorent considérablement les performances IDS (taux de détection) tout en réduisant les faux positifs, non pertinents et alertes en double.----------ABSTRACT Nowadays, protecting computer systems and networks against various distributed and multi-steps attack has been a vital challenge for their owners. One of the essential threats to the security of such computer infrastructures is attacks by malicious individuals from inside and outside of the system environment to abuse available services, or reveal their confidential information. Consequently, managing and supervising computer systems is a considerable challenge, as new threats and attacks are discovered on a daily basis. Intrusion Detection Systems (IDSs) play a key role in the surveillance and monitoring of computer network infrastructures. These systems inspect events occurred in computer systems and networks and in case of any malicious behavior they generate appropriate alerts describing the attacks’ details. However, there are a number of shortcomings that need to be addressed to make them reliable enough in the real-world situations. One of the fundamental challenges in real-world IDS is the large number of redundant, non-relevant, and false positive alerts that they generate, making it a difficult task for security administrators to determine and identify real and important alerts. Part of the problem is that most of the IDS do not take into account contextual information (type of systems, applications, users, networks, etc.), and therefore a large portion of the alerts are non-relevant in that even though they correctly recognize an intrusion, the intrusion fails to reach its objectives. Additionally, to detect newer and complicated attacks, relying on only one detection sensor type is not adequate, and as a result many of the current IDS are unable to detect them. This is especially important with respect to targeted attacks that try to avoid detection by conventional IDS and by other security products. While many system administrators are known to successfully incorporate context information and many different types of sensors and logs into their analysis, an important problem with this approach is the lack of automation in both storage and analysis. In order to address these problems in IDS applicability, various IDS types have been proposed in the recent years and commercial off-the-shelf (COTS) IDS products have found their way into Security Operations Centers (SOC) of many large organizations. From a general perspective, these works can be categorized into: machine learning based approaches including Bayesian networks, data mining methods, decision trees, neural networks, etc., alert correlation and alert fusion based approaches, context-aware intrusion detection systems, distributed intrusion detection systems, and ontology based intrusion detection systems. To the best of our knowledge, since these works only focus on one or few of the IDS challenges, the problem as a whole has not been resolved. Hence, there is no comprehensive work addressing all the mentioned challenges of modern intrusion detection systems. For example, works that utilize machine learning approaches only classify events based on some features depending on behavior observed with one type of events, and they do not take into account contextual information and event interrelationships. Most of the proposed alert correlation techniques consider correlation only across multiple sensors of the same type having a common event and alert semantics (homogeneous correlation), leaving it to security administrators to perform correlation across heterogeneous types of sensors. Context-aware approaches only employ limited aspects of the underlying context. The lack of accurate evaluation based on the data sets that encompass modern complex attack scenarios is another major shortcoming of most of the proposed approaches. The goal of this thesis is to design an event correlation system that can correlate across several heterogeneous types of sensors and logs (e.g. IDS/IPS, firewall, database, operating system, anti-virus, web proxy, routers, etc.) in order to hope to detect complex attacks that leave traces in various systems, and incorporate context information into the analysis, in order to reduce false positives. To this end, our contributions can be split into 4 main parts: 1) we propose the Pasargadae comprehensive context-aware and ontology-based event correlation framework that automatically performs event correlation by reasoning on the information collected from various information resources. Pasargadae uses ontologies to represent and store information on events, context and vulnerability information, and attack scenarios, and uses simple ontology logic rules written in Semantic Query-Enhance Web Rule Language (SQWRL) to correlate various information and filter out non-relevant alerts and duplicate alerts, and false positives. 2) We propose a meta-event based, topological sort based and semantic-based event correlation approach that employs Pasargadae to perform event correlation across events collected form several sensors distributed in a computer network. 3) We propose a semantic-based context-aware alert fusion approach that relies on some of the subcomponents of Pasargadae to perform heterogeneous alert fusion collected from heterogeneous IDS. 4) In order to show the level of flexibility of Pasargadae, we use it to implement some other proposed alert and event correlation approaches. The sum of these contributions represent a significant improvement in the applicability and reliability of IDS in real-world situations. In order to test the performance and flexibility of the proposed event correlation approach, we need to address the lack of experimental infrastructure suitable for network security. A study of the literature shows that current experimental approaches are not appropriate to generate high fidelity network data. Consequently, in order to accomplish a comprehensive evaluation, first, we conduct our experiments on two separate analysis case study scenarios, inspired from the DARPA 2000 and UNB ISCX IDS evaluation data sets. Next, as a complete field study, we employ Pasargadae in a real computer network for a two weeks period to inspect its detection capabilities on a ground truth network traffic. The results obtained show that compared to other existing IDS improvements, the proposed contributions significantly improve IDS performance (detection rate) while reducing false positives, non-relevant and duplicate alerts

    Enhancing data privacy and security in Internet of Things through decentralized models and services

    Get PDF
    exploits a Byzantine Fault Tolerant (BFT) blockchain, in order to perform collaborative and dynamic botnet detection by collecting and auditing IoT devices\u2019 network traffic flows as blockchain transactions. Secondly, we take the challenge to decentralize IoT, and design a hybrid blockchain architecture for IoT, by proposing Hybrid-IoT. In Hybrid-IoT, subgroups of IoT devices form PoW blockchains, referred to as PoW sub-blockchains. Connection among the PoW sub-blockchains employs a BFT inter-connector framework. We focus on the PoW sub-blockchains formation, guided by a set of guidelines based on a set of dimensions, metrics and bounds

    Enhancing data privacy and security in Internet of Things through decentralized models and services

    Get PDF
    exploits a Byzantine Fault Tolerant (BFT) blockchain, in order to perform collaborative and dynamic botnet detection by collecting and auditing IoT devices’ network traffic flows as blockchain transactions. Secondly, we take the challenge to decentralize IoT, and design a hybrid blockchain architecture for IoT, by proposing Hybrid-IoT. In Hybrid-IoT, subgroups of IoT devices form PoW blockchains, referred to as PoW sub-blockchains. Connection among the PoW sub-blockchains employs a BFT inter-connector framework. We focus on the PoW sub-blockchains formation, guided by a set of guidelines based on a set of dimensions, metrics and bounds

    Towards adaptive anomaly detection systems using boolean combination of hidden Markov models

    Get PDF
    Anomaly detection monitors for significant deviations from normal system behavior. Hidden Markov Models (HMMs) have been successfully applied in many intrusion detection applications, including anomaly detection from sequences of operating system calls. In practice, anomaly detection systems (ADSs) based on HMMs typically generate false alarms because they are designed using limited representative training data and prior knowledge. However, since new data may become available over time, an important feature of an ADS is the ability to accommodate newly-acquired data incrementally, after it has originally been trained and deployed for operations. Incremental re-estimation of HMM parameters raises several challenges. HMM parameters should be updated from new data without requiring access to the previously-learned training data, and without corrupting previously-learned models of normal behavior. Standard techniques for training HMM parameters involve iterative batch learning, and hence must observe the entire training data prior to updating HMM parameters. Given new training data, these techniques must restart the training procedure using all (new and previously-accumulated) data. Moreover, a single HMM system for incremental learning may not adequately approximate the underlying data distribution of the normal process, due to the many local maxima in the solution space. Ensemble methods have been shown to alleviate knowledge corruption, by combining the outputs of classifiers trained independently on successive blocks of data. This thesis makes contributions at the HMM and decision levels towards improved accuracy, efficiency and adaptability of HMM-based ADSs. It first presents a survey of techniques found in literature that may be suitable for incremental learning of HMM parameters, and assesses the challenges faced when these techniques are applied to incremental learning scenarios in which the new training data is limited and abundant. Consequently, An efficient alternative to the Forward-Backward algorithm is first proposed to reduce the memory complexity without increasing the computational overhead of HMM parameters estimation from fixed-size abundant data. Improved techniques for incremental learning of HMM parameters are then proposed to accommodate new data over time, while maintaining a high level of performance. However, knowledge corruption caused by a single HMM with a fixed number of states remains an issue. To overcome such limitations, this thesis presents an efficient system to accommodate new data using a learn-and-combine approach at the decision level. When a new block of training data becomes available, a new pool of base HMMs is generated from the data using a different number of HMM states and random initializations. The responses from the newly-trained HMMs are then combined to those of the previously-trained HMMs in receiver operating characteristic (ROC) space using novel Boolean combination (BC) techniques. The learn-and-combine approach allows to select a diversified ensemble of HMMs (EoHMMs) from the pool, and adapts the Boolean fusion functions and thresholds for improved performance, while it prunes redundant base HMMs. The proposed system is capable of changing its desired operating point during operations, and this point can be adjusted to changes in prior probabilities and costs of errors. During simulations conducted for incremental learning from successive data blocks using both synthetic and real-world system call data sets, the proposed learn-and-combine approach has been shown to achieve the highest level of accuracy than all related techniques. In particular, it can sustain a significantly higher level of accuracy than when the parameters of a single best HMM are re-estimated for each new block of data, using the reference batch learning and the proposed incremental learning techniques. It also outperforms static fusion techniques such as majority voting for combining the responses of new and previously-generated pools of HMMs. Ensemble selection techniques have been shown to form compact EoHMMs for operations, by selecting diverse and accurate base HMMs from the pool while maintaining or improving the overall system accuracy. Pruning has been shown to prevents pool sizes from increasing indefinitely with the number of data blocks acquired over time. Therefore, the storage space for accommodating HMMs parameters and the computational costs of the selection techniques are reduced, without negatively affecting the overall system performance. The proposed techniques are general in that they can be employed to adapt HMM-based systems to new data, within a wide range of application domains. More importantly, the proposed Boolean combination techniques can be employed to combine diverse responses from any set of crisp or soft one- or two-class classifiers trained on different data or features or trained according to different parameters, or from different detectors trained on the same data. In particular, they can be effectively applied when training data is limited and test data is imbalanced

    NLP-Based Techniques for Cyber Threat Intelligence

    Full text link
    In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence~(CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor's targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Optimization Modeling and Machine Learning Techniques Towards Smarter Systems and Processes

    Get PDF
    The continued penetration of technology in our daily lives has led to the emergence of the concept of Internet-of-Things (IoT) systems and networks. An increasing number of enterprises and businesses are adopting IoT-based initiatives expecting that it will result in higher return on investment (ROI) [1]. However, adopting such technologies poses many challenges. One challenge is improving the performance and efficiency of such systems by properly allocating the available and scarce resources [2, 3]. A second challenge is making use of the massive amount of data generated to help make smarter and more informed decisions [4]. A third challenge is protecting such devices and systems given the surge in security breaches and attacks in recent times [5]. To that end, this thesis proposes the use of various optimization modeling and machine learning techniques in three different systems; namely wireless communication systems, learning management systems (LMSs), and computer network systems. In par- ticular, the first part of the thesis posits optimization modeling techniques to improve the aggregate throughput and power efficiency of a wireless communication network. On the other hand, the second part of the thesis proposes the use of unsupervised machine learning clustering techniques to be integrated into LMSs to identify unengaged students based on their engagement with material in an e-learning environment. Lastly, the third part of the thesis suggests the use of exploratory data analytics, unsupervised machine learning clustering, and supervised machine learning classification techniques to identify malicious/suspicious domain names in a computer network setting. The main contributions of this thesis can be divided into three broad parts. The first is developing optimal and heuristic scheduling algorithms that improve the performance of wireless systems in terms of throughput and power by combining wireless resource virtualization with device-to-device and machine-to-machine communications. The second is using unsupervised machine learning clustering and association algorithms to determine an appropriate engagement level model for blended e-learning environments and study the relationship between engagement and academic performance in such environments. The third is developing a supervised ensemble learning classifier to detect malicious/suspicious domain names that achieves high accuracy and precision
    corecore