86 research outputs found

    INSecS: An Intelligent Network Security System

    Get PDF
    There are new challenges in network security, introduced by the nature of modern networks like IoT systems, Cloud systems, and other distributed systems. System resource limitations in IoT, delays in processing the large stream of data from Cloud and distributed system, incapability to handle multi-step attacks due to delay in updates, limited datasets used for Intrusion Detection System (IDS) training which impacts the system performance are some of the pressing issues. To address these challenges, the author proposes Intelligent Network Security Systems, a framework that can handle these issues and also be as accurate as a commercial grade IDS. The proposed framework consists of three components: a Dataset Creation Software (DCS), an Intrusion Detection System and a Learning module. This thesis presents implementation details and validation results for DCS and IDS. The first component is a highly customizable software framework capable of generating labeled network intrusion datasets on demand. This software is able to collect data from a live network as well as from a pre-recorded packet capture file. The output can be either Raw packet capture (PCAP) with selected attributes per packet or a processed dataset with customized attributes related to both individual packet features and overall traffic behavior within a time window. The abilities of this component are compared with a state-of-the-art dataset creation system through a feature comparison. The proposed Intrusion Detection System is a novel, distributed IDS that is able to perform in real-time in a distributed system. Hierarchical decision making is used to reduce traffic overhead on the IDS and allow faster Intrusion Detection. The IDS also detects multi-step attacks faster by updating the system rules when a reconnaissance attack is detected, without any human intervention. Internal attacks are also detected easily because of the distributed nature of the IDS. The performance tests show that the IDS performs 8 times faster on averages with the hierarchical decision-making structure and still maintains the same level of accuracy as Snort

    Declarative domain-specific languages and applications to network monitoring

    Get PDF
    Os Sistemas de Detecção de Intrusões em Redes de Computadores são provavelmente usados desde que existem redes de computadores. Estes sistemas têm como objectivo monitorizarem o tráfego de rede, procurando anomalias, comportamentos indesejáveis ou vestígios de ataques conhecidos, por forma a manter utilizadores, dados, máquinas e serviços seguros, garantindo que as redes de computadores são locais de trabalho seguros. Neste trabalho foi desenvolvido um Sistema de Detecção de Intrusões em Redes de Computadores, chamado NeMODe (NEtwork MOnitoring DEclarative approach), que fornece mecanismos de detecção baseados em Programação por Restrições, bem como uma Linguagem Específica de Domínio criada para modelar ataques específicos, usando para isso metodologias de programação declarativa, permitindo relacionar vários pacotes de rede e procurar intrusões que se propagam por vários pacotes e ao longo do tempo. As principais contribuições do trabalho descrito nesta tese são: Uma abordagem declarativa aos Sistema de Detecção de Intrusões em Redes de Computadores, incluindo mecanismos de detecção baseados em Programação por Restrições, permitindo a detecção de ataques distribuídos ao longo de vários pacotes e num intervalo de tempo. Uma Linguagem Específica de Domínio baseada nos conceitos de Programação por Restrições, usada para descrever os ataques nos quais estamos interessados em detectar. Um compilador para a Linguagem Específica de Domínio fornecida pelo sistema NeMODe, capaz de gerar múltiplos detectores de ataques baseados em Gecode, Adaptive Search e MiniSat; ### Abstract: Network Intrusion Detection Systems (NIDSs) are in use probably ever since there are computer networks, with the purpose of monitoring network traffic looking for anomalies, undesired behaviors or a trace of known intrusions to keep both users, data, hosts and services safe, ensuring computer networks are a secure place to work. In this work, we developed a Network Intrusion Detection System (NIDS) called NeMODe (NEtwork MOnitoring DEclarative approach), which provides a detection mechanism based on Constraint Programming (CP) together with a Domain Specific Language (DSL) crafted to model the specific intrusions using declarative methodologies, able to relate several network packets and look for intrusions which span several network packets. The main contributions of the work described in this thesis are: A declarative approach to Network Intrusion Detection Systems, including detection mechanisms based on several Constraint Programming approaches, allowing the detection of network intrusions which span several network packets and spread over time. A Domain Specific Language (DSL) based on Constraint Programming methodologies, used to describe the network intrusions which we are interested in finding on the network traffic. A compiler for the DSL able to generate multiple detection mechanisms based on Gecode, Adaptive Search and MiniSat

    On the Adaptive Real-Time Detection of Fast-Propagating Network Worms

    Get PDF
    We present two light-weight worm detection algorithms thatoffer significant advantages over fixed-threshold methods.The first algorithm, RBS (rate-based sequential hypothesis testing)aims at the large class of worms that attempts to quickly propagate, thusexhibiting abnormal levels of the rate at which hosts initiateconnections to new destinations. The foundation of RBS derives fromthe theory of sequential hypothesis testing, the use of which fordetecting randomly scanning hosts was first introduced by our previouswork with the TRW (Threshold Random Walk) scan detection algorithm. The sequential hypothesistesting methodology enables engineering the detectors to meet falsepositives and false negatives targets, rather than triggering whenfixed thresholds are crossed. In this sense, the detectors that weintroduce are truly adaptive.We then introduce RBS+TRW, an algorithm that combines fan-out rate (RBS)and probability of failure (TRW) of connections to new destinations.RBS+TRW provides a unified framework that at one end acts as a pure RBSand at the other end as pure TRW, and extends RBS's power in detectingworms that scan randomly selected IP addresses

    A Few-Shot Learning-Based Siamese Capsule Network for Intrusion Detection with Imbalanced Training Data

    Get PDF
    Network intrusion detection remains one of the major challenges in cybersecurity. In recent years, many machine-learning-based methods have been designed to capture the dynamic and complex intrusion patterns to improve the performance of intrusion detection systems. However, two issues, including imbalanced training data and new unknown attacks, still hinder the development of a reliable network intrusion detection system. In this paper, we propose a novel few-shot learning-based Siamese capsule network to tackle the scarcity of abnormal network traffic training data and enhance the detection of unknown attacks. In specific, the well-designed deep learning network excels at capturing dynamic relationships across traffic features. In addition, an unsupervised subtype sampling scheme is seamlessly integrated with the Siamese network to improve the detection of network intrusion attacks under the circumstance of imbalanced training data. Experimental results have demonstrated that the metric learning framework is more suitable to extract subtle and distinctive features to identify both known and unknown attacks after the sampling scheme compared to other supervised learning methods. Compared to the state-of-the-art methods, our proposed method achieves superior performance to effectively detect both types of attacks

    Anomaly-based Correlation of IDS Alarms

    Get PDF
    An Intrusion Detection System (IDS) is one of the major techniques for securing information systems and keeping pace with current and potential threats and vulnerabilities in computing systems. It is an indisputable fact that the art of detecting intrusions is still far from perfect, and IDSs tend to generate a large number of false IDS alarms. Hence human has to inevitably validate those alarms before any action can be taken. As IT infrastructure become larger and more complicated, the number of alarms that need to be reviewed can escalate rapidly, making this task very difficult to manage. The need for an automated correlation and reduction system is therefore very much evident. In addition, alarm correlation is valuable in providing the operators with a more condensed view of potential security issues within the network infrastructure. The thesis embraces a comprehensive evaluation of the problem of false alarms and a proposal for an automated alarm correlation system. A critical analysis of existing alarm correlation systems is presented along with a description of the need for an enhanced correlation system. The study concludes that whilst a large number of works had been carried out in improving correlation techniques, none of them were perfect. They either required an extensive level of domain knowledge from the human experts to effectively run the system or were unable to provide high level information of the false alerts for future tuning. The overall objective of the research has therefore been to establish an alarm correlation framework and system which enables the administrator to effectively group alerts from the same attack instance and subsequently reduce the volume of false alarms without the need of domain knowledge. The achievement of this aim has comprised the proposal of an attribute-based approach, which is used as a foundation to systematically develop an unsupervised-based two-stage correlation technique. From this formation, a novel SOM K-Means Alarm Reduction Tool (SMART) architecture has been modelled as the framework from which time and attribute-based aggregation technique is offered. The thesis describes the design and features of the proposed architecture, focusing upon the key components forming the underlying architecture, the alert attributes and the way they are processed and applied to correlate alerts. The architecture is strengthened by the development of a statistical tool, which offers a mean to perform results or alert analysis and comparison. The main concepts of the novel architecture are validated through the implementation of a prototype system. A series of experiments were conducted to assess the effectiveness of SMART in reducing false alarms. This aimed to prove the viability of implementing the system in a practical environment and that the study has provided appropriate contribution to knowledge in this field

    Real-time detection of malicious network activity using stochastic models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 115-122).This dissertation develops approaches to rapidly detect malicious network traffic including packets sent by portscanners and network worms. The main hypothesis is that stochastic models capturing a host's particular connection-level behavior provide a good foundation for identifying malicious network activity in real-time. Using the models, the dissertation shows that a detection problem can be formulated as one of observing a particular "trajectory" of arriving packets and inferring from it the most likely classification for the given host's behavior. This stochastic approach enables us not only to estimate an algorithm's performance based on the measurable statistics of a host's traffic but also to balance the goals of promptness and accuracy in detecting malicious network activity. This dissertation presents three detection algorithms based on Wald's mathematical framework of sequential analysis. First, Threshold Random Walk (TRW) rapidly detects remote hosts performing a portscan to a target network. TRW is motivated by the empirically observed disparity between the frequency with which connections to newly visited local addresses are successful for benign hosts vs. for portscanners. Second, it presents a hybrid approach that accurately detects scanning worm infections quickly after the infected local host begins to engage in worm propagation.(cont.) Finally, it presents a targeting worm detection algorithm, Rate-Based Sequential Hypothesis Testing (RBS), that promptly identifies high-fan-out behavior by hosts (e.g., targeting worms) based on the rate at which the hosts initiate connections to new destinations. RBS is built on an empirically-driven probability model that captures benign network characteristics. It then presents RBS+TRW, a unified framework for detecting fast-propagating worms independently of their target discovery strategy. All these schemes have been implemented and evaluated using real packet traces collected from multiple network vantage points.by Jaeyeon Jung.Ph.D

    Timely processing of big data in collaborative large-scale distributed systems

    Get PDF
    Today’s Big Data phenomenon, characterized by huge volumes of data produced at very high rates by heterogeneous and geographically dispersed sources, is fostering the employment of large-scale distributed systems in order to leverage parallelism, fault tolerance and locality awareness with the aim of delivering suitable performances. Among the several areas where Big Data is gaining increasing significance, the protection of Critical Infrastructure is one of the most strategic since it impacts on the stability and safety of entire countries. Intrusion detection mechanisms can benefit a lot from novel Big Data technologies because these allow to exploit much more information in order to sharpen the accuracy of threats discovery. A key aspect for increasing even more the amount of data at disposal for detection purposes is the collaboration (meant as information sharing) among distinct actors that share the common goal of maximizing the chances to recognize malicious activities earlier. Indeed, if an agreement can be found to share their data, they all have the possibility to definitely improve their cyber defenses. The abstraction of Semantic Room (SR) allows interested parties to form trusted and contractually regulated federations, the Semantic Rooms, for the sake of secure information sharing and processing. Another crucial point for the effectiveness of cyber protection mechanisms is the timeliness of the detection, because the sooner a threat is identified, the faster proper countermeasures can be put in place so as to confine any damage. Within this context, the contributions reported in this thesis are threefold * As a case study to show how collaboration can enhance the efficacy of security tools, we developed a novel algorithm for the detection of stealthy port scans, named R-SYN (Ranked SYN port scan detection). We implemented it in three distinct technologies, all of them integrated within an SR-compliant architecture that allows for collaboration through information sharing: (i) in a centralized Complex Event Processing (CEP) engine (Esper), (ii) in a framework for distributed event processing (Storm) and (iii) in Agilis, a novel platform for batch-oriented processing which leverages the Hadoop framework and a RAM-based storage for fast data access. Regardless of the employed technology, all the evaluations have shown that increasing the number of participants (that is, increasing the amount of input data at disposal), allows to improve the detection accuracy. The experiments made clear that a distributed approach allows for lower detection latency and for keeping up with higher input throughput, compared with a centralized one. * Distributing the computation over a set of physical nodes introduces the issue of improving the way available resources are assigned to the elaboration tasks to execute, with the aim of minimizing the time the computation takes to complete. We investigated this aspect in Storm by developing two distinct scheduling algorithms, both aimed at decreasing the average elaboration time of the single input event by decreasing the inter-node traffic. Experimental evaluations showed that these two algorithms can improve the performance up to 30%. * Computations in online processing platforms (like Esper and Storm) are run continuously, and the need of refining running computations or adding new computations, together with the need to cope with the variability of the input, requires the possibility to adapt the resource allocation at runtime, which entails a set of additional problems. Among them, the most relevant concern how to cope with incoming data and processing state while the topology is being reconfigured, and the issue of temporary reduced performance. At this aim, we also explored the alternative approach of running the computation periodically on batches of input data: although it involves a performance penalty on the elaboration latency, it allows to eliminate the great complexity of dynamic reconfigurations. We chose Hadoop as batch-oriented processing framework and we developed some strategies specific for dealing with computations based on time windows, which are very likely to be used for pattern recognition purposes, like in the case of intrusion detection. Our evaluations provided a comparison of these strategies and made evident the kind of performance that this approach can provide

    Addressing practical challenges for anomaly detection in backbone networks

    Get PDF
    Network monitoring has always been a topic of foremost importance for both network operators and researchers for multiple reasons ranging from anomaly detection to tra c classi cation or capacity planning. Nowadays, as networks become more and more complex, tra c increases and security threats reproduce, achieving a deeper understanding of what is happening in the network has become an essential necessity. In particular, due to the considerable growth of cybercrime, research on the eld of anomaly detection has drawn signi cant attention in recent years and tons of proposals have been made. All the same, when it comes to deploying solutions in real environments, some of them fail to meet some crucial requirements. Taking this into account, this thesis focuses on lling this gap between the research and the non-research world. Prior to the start of this work, we identify several problems. First, there is a clear lack of detailed and updated information on the most common anomalies and their characteristics. Second, unawareness of sampled data is still common although the performance of anomaly detection algorithms is severely a ected. Third, operators currently need to invest many work-hours to manually inspect and also classify detected anomalies to act accordingly and take the appropriate mitigation measures. This is further exacerbated due to the high number of false positives and false negatives and because anomaly detection systems are often perceived as extremely complex black boxes. Analysing an issue is essential to fully comprehend the problem space and to be able to tackle it properly. Accordingly, the rst block of this thesis seeks to obtain detailed and updated real-world information on the most frequent anomalies occurring in backbone networks. It rst reports on the performance of di erent commercial systems for anomaly detection and analyses the types of network nomalies detected. Afterwards, it focuses on further investigating the characteristics of the anomalies found in a backbone network using one of the tools for more than half a year. Among other results, this block con rms the need of applying sampling in an operational environment as well as the unacceptably high number of false positives and false negatives still reported by current commercial tools. On the whole, the presence of ampling in large networks for monitoring purposes has become almost mandatory and, therefore, all anomaly detection algorithms that do not take that into account might report incorrect results. In the second block of this thesis, the dramatic impact of sampling on the performance of well-known anomaly detection techniques is analysed and con rmed. However, we show that the results change signi cantly depending on the sampling technique used and also on the common metric selected to perform the comparison. In particular, we show that, Packet Sampling outperforms Flow Sampling unlike previously reported. Furthermore, we observe that Selective Sampling (SES), a sampling technique that focuses on small ows, obtains much better results than traditional sampling techniques for scan detection. Consequently, we propose Online Selective Sampling, a sampling technique that obtains the same good performance for scan detection than SES but works on a per-packet basis instead of keeping all ows in memory. We validate and evaluate our proposal and show that it can operate online and uses much less resources than SES. Although the literature is plenty of techniques for detecting anomalous events, research on anomaly classi cation and extraction (e.g., to further investigate what happened or to share evidence with third parties involved) is rather marginal. This makes it harder for network operators to analise reported anomalies because they depend solely on their experience to do the job. Furthermore, this task is an extremely time-consuming and error-prone process. The third block of this thesis targets this issue and brings it together with the knowledge acquired in the previous blocks. In particular, it presents a system for automatic anomaly detection, extraction and classi cation with high accuracy and very low false positives. We deploy the system in an operational environment and show its usefulness in practice. The fourth and last block of this thesis presents a generalisation of our system that focuses on analysing all the tra c, not only network anomalies. This new system seeks to further help network operators by summarising the most signi cant tra c patterns in their network. In particular, we generalise our system to deal with big network tra c data. In particular, it deals with src/dst IPs, src/dst ports, protocol, src/dst Autonomous Systems, layer 7 application and src/dst geolocation. We rst deploy a prototype in the European backbone network of G EANT and show that it can process large amounts of data quickly and build highly informative and compact reports that are very useful to help comprehending what is happening in the network. Second, we deploy it in a completely di erent scenario and show how it can also be successfully used in a real-world use case where we analyse the behaviour of highly distributed devices related with a critical infrastructure sector.La monitoritzaci o de xarxa sempre ha estat un tema de gran import ancia per operadors de xarxa i investigadors per m ultiples raons que van des de la detecci o d'anomalies fins a la classi caci o d'aplicacions. Avui en dia, a mesura que les xarxes es tornen m es i m es complexes, augmenta el tr ansit de dades i les amenaces de seguretat segueixen creixent, aconseguir una comprensi o m es profunda del que passa a la xarxa s'ha convertit en una necessitat essencial. Concretament, degut al considerable increment del ciberactivisme, la investigaci o en el camp de la detecci o d'anomalies ha crescut i en els darrers anys s'han fet moltes i diverses propostes. Tot i aix o, quan s'intenten desplegar aquestes solucions en entorns reals, algunes d'elles no compleixen alguns requisits fonamentals. Tenint aix o en compte, aquesta tesi se centra a omplir aquest buit entre la recerca i el m on real. Abans d'iniciar aquest treball es van identi car diversos problemes. En primer lloc, hi ha una clara manca d'informaci o detallada i actualitzada sobre les anomalies m es comuns i les seves caracter stiques. En segona inst ancia, no tenir en compte la possibilitat de treballar amb nom es part de les dades (mostreig de tr ansit) continua sent bastant est es tot i el sever efecte en el rendiment dels algorismes de detecci o d'anomalies. En tercer lloc, els operadors de xarxa actualment han d'invertir moltes hores de feina per classi car i inspeccionar manualment les anomalies detectades per actuar en conseqüencia i prendre les mesures apropiades de mitigaci o. Aquesta situaci o es veu agreujada per l'alt nombre de falsos positius i falsos negatius i perqu e els sistemes de detecci o d'anomalies s on sovint percebuts com caixes negres extremadament complexes. Analitzar un tema es essencial per comprendre plenament l'espai del problema i per poder-hi fer front de forma adequada. Per tant, el primer bloc d'aquesta tesi pret en proporcionar informaci o detallada i actualitzada del m on real sobre les anomalies m es freqüents en una xarxa troncal. Primer es comparen tres eines comercials per a la detecci o d'anomalies i se n'estudien els seus punts forts i febles, aix com els tipus d'anomalies de xarxa detectats. Posteriorment, s'investiguen les caracter stiques de les anomalies que es troben en la mateixa xarxa troncal utilitzant una de les eines durant m es de mig any. Entre d'altres resultats, aquest bloc con rma la necessitat de l'aplicaci o de mostreig de tr ansit en un entorn operacional, aix com el nombre inacceptablement elevat de falsos positius i falsos negatius en eines comercials actuals. En general, el mostreig de tr ansit de dades de xarxa ( es a dir, treballar nom es amb una part de les dades) en grans xarxes troncals s'ha convertit en gaireb e obligatori i, per tant, tots els algorismes de detecci o d'anomalies que no ho tenen en compte poden veure seriosament afectats els seus resultats. El segon bloc d'aquesta tesi analitza i confi rma el dram atic impacte de mostreig en el rendiment de t ecniques de detecci o d'anomalies plenament acceptades a l'estat de l'art. No obstant, es mostra que els resultats canvien signi cativament depenent de la t ecnica de mostreig utilitzada i tamb e en funci o de la m etrica usada per a fer la comparativa. Contr ariament als resultats reportats en estudis previs, es mostra que Packet Sampling supera Flow Sampling. A m es, a m es, s'observa que Selective Sampling (SES), una t ecnica de mostreig que se centra en mostrejar fluxes petits, obt e resultats molt millors per a la detecci o d'escanejos que no pas les t ecniques tradicionals de mostreig. En conseqü encia, proposem Online Selective Sampling, una t ecnica de mostreig que obt e el mateix bon rendiment per a la detecci o d'escanejos que SES, per o treballa paquet per paquet enlloc de mantenir tots els fluxes a mem oria. Despr es de validar i evaluar la nostra proposta, demostrem que es capa c de treballar online i utilitza molts menys recursos que SES. Tot i la gran quantitat de tècniques proposades a la literatura per a la detecci o d'esdeveniments an omals, la investigaci o per a la seva posterior classi caci o i extracci o (p.ex., per investigar m es a fons el que va passar o per compartir l'evid encia amb tercers involucrats) es m es aviat marginal. Aix o fa que sigui m es dif cil per als operadors de xarxa analalitzar les anomalies reportades, ja que depenen unicament de la seva experi encia per fer la feina. A m es a m es, aquesta tasca es un proc es extremadament lent i propens a errors. El tercer bloc d'aquesta tesi se centra en aquest tema tenint tamb e en compte els coneixements adquirits en els blocs anteriors. Concretament, presentem un sistema per a la detecci o extracci o i classi caci o autom atica d'anomalies amb una alta precisi o i molt pocs falsos positius. Adicionalment, despleguem el sistema en un entorn operatiu i demostrem la seva utilitat pr actica. El quart i ultim bloc d'aquesta tesi presenta una generalitzaci o del nostre sistema que se centra en l'an alisi de tot el tr ansit, no nom es en les anomalies. Aquest nou sistema pret en ajudar m es als operadors ja que resumeix els patrons de tr ansit m es importants de la seva xarxa. En particular, es generalitza el sistema per fer front al "big data" (una gran quantitat de dades). En particular, el sistema tracta IPs origen i dest i, ports origen i destí , protocol, Sistemes Aut onoms origen i dest , aplicaci o que ha generat el tr ansit i fi nalment, dades de geolocalitzaci o (tamb e per origen i dest ). Primer, despleguem un prototip a la xarxa europea per a la recerca i la investigaci o (G EANT) i demostrem que el sistema pot processar grans quantitats de dades r apidament aix com crear informes altament informatius i compactes que s on de gran utilitat per ajudar a comprendre el que est a succeint a la xarxa. En segon lloc, despleguem la nostra eina en un escenari completament diferent i mostrem com tamb e pot ser utilitzat amb exit en un cas d' us en el m on real en el qual s'analitza el comportament de dispositius altament distribuïts

    Anomaly detection for resilience in cloud computing infrastructures

    Get PDF
    Cloud computing is a relatively recent model where scalable and elastic resources are provided as optimized, cost-effective and on-demand utility-like services to customers. As one of the major trends in the IT industry in recent years, cloud computing has gained momentum and started to revolutionise the way enterprises create and deliver IT solutions. Motivated primarily due to cost reduction, these cloud environments are also being used by Information and Communication Technologies (ICT) operating Critical Infrastructures (CI). However, due to the complex nature of underlying infrastructures, these environments are subject to a large number of challenges, including mis-configurations, cyber attacks and malware instances, which manifest themselves as anomalies. These challenges clearly reduce the overall reliability and availability of the cloud, i.e., it is less resilient to challenges. Resilience is intended to be a fundamental property of cloud service provisioning platforms. However, a number of significant challenges in the past demonstrated that cloud environments are not as resilient as one would hope. There is also limited understanding about how to provide resilience in the cloud that can address such challenges. This implies that it is of utmost importance to clearly understand and define what constitutes the correct, normal behaviour so that deviation from it can be detected as anomalies and consequently higher resilience can be achieved. Also, for characterising and identifying challenges, anomaly detection techniques can be used and this is due to the fact that the statistical models embodied in these techniques allow the robust characterisation of normal behaviour, taking into account various monitoring metrics to detect known and unknown patterns. These anomaly detection techniques can also be applied within a resilience framework in order to promptly provide indications and warnings about adverse events or conditions that may occur. However, due to the scale and complexity of cloud, detection based on continuous real time infrastructure monitoring becomes challenging. Because monitoring leads to an overwhelming volume of data, this adversely affects the ability of the underlying detection mechanisms to analyse the data. The increasing volume of metrics, compounded with complexity of infrastructure, may also cause low detection accuracy. In this thesis, a comprehensive evaluation of anomaly detection techniques in cloud infrastructures is presented under typical elastic behaviour. More specifically, an investigation of the impact of live virtual machine migration on state of the art anomaly detection techniques is carried out, by evaluating live migration under various attack types and intensities. An initial comparison concludes that, whilst many detection techniques have been proposed, none of them is suited to work within a cloud operational context. The results suggest that in some configurations anomalies are missed and some configuration anomalies are wrongly classified. Moreover, some of these approaches have been shown to be sensitive to parameters of the datasets such as the level of traffic aggregation, and they suffer from other robustness problems. In general, anomaly detection techniques are founded on specific assumptions about the data, for example the statistical distributions of events. If these assumptions do not hold, an outcome can be high false positive rates. Based on this initial study, the objective of this work is to establish a light-weight real time anomaly detection technique which is more suited to a cloud operational context by keeping low false positive rates without the need for prior knowledge and thus enabling the administrator to respond to threats effectively. Furthermore, a technique is needed which is robust to the properties of cloud infrastructures, such as elasticity and limited knowledge of the services, and such that it can support other resilience supporting mechanisms. From this formulation, a cloud resilience management framework is proposed which incorporates the anomaly detection and other supporting mechanisms that collectively address challenges that manifest themselves as anomalies. The framework is a holistic endto-end framework for resilience that considers both networking and system issues, and spans the various stages of an existing resilience strategy, called (D2R 2+DR). In regards to the operational applicability of detection mechanisms, a novel Anomaly Detection-as-a-Service (ADaaS) architecture has been modelled as the means to implement the detection technique. A series of experiments was conducted to assess the effectiveness of the proposed technique for ADaaS. These aimed to improve the viability of implementing the system in an operational context. Finally, the proposed model is deployed in a European Critical Infrastructure provider’s network running various critical services, and validated the results in real time scenarios with the use of various test cases, and finally demonstrating the advantages of such a model in an operational context. The obtained results show that anomalies are detectable with high accuracy with no prior-knowledge, and it can be concluded that ADaaS is applicable to cloud scenarios for a flexible multi-tenant detection systems, clearly establishing its effectiveness for cloud infrastructure resilience
    • …
    corecore