88 research outputs found

    Grammatical Evolution for Detecting Cyberattacks in Internet of Things Environments

    Get PDF
    The Internet of Things (IoT) is revolutionising nearly every aspect of modern life, playing an ever greater role in both industrial and domestic sectors. The increasing frequency of cyber-incidents is a consequence of the pervasiveness of IoT. Threats are becoming more sophisticated, with attackers using new attacks or modifying existing ones. Security teams must deal with a diverse and complex threat landscape that is constantly evolving. Traditional security solutions cannot protect such sys- tems adequately and so researchers have begun to use Machine Learning algorithms to discover effective defence systems. In this paper, we investigate how one approach from the domain of evolutionary computation - grammatical evolution - can be used to identify cyberattacks in IoT environments. The experiments were conducted on up-to-date datasets and compared with state- of-the-art algorithms. The potential application of evolutionary computation-based approaches to detect unknown attacks is also examined and discusse

    Network Traffic Analysis Using Stochastic Grammars

    Get PDF
    Network traffic analysis is widely used to infer information from Internet traffic. This is possible even if the traffic is encrypted. Previous work uses traffic characteristics, such as port numbers, packet sizes, and frequency, without looking for more subtle patterns in the network traffic. In this work, we use stochastic grammars, hidden Markov models (HMMs) and probabilistic context-free grammars (PCFGs), as pattern recognition tools for traffic analysis. HMMs are widely used for pattern recognition and detection. We use a HMM inference approach. With inferred HMMs, we use confidence intervals (CI) to detect if a data sequence matches the HMM. To compare HMMs, we define a normalized Markov metric. A statistical test is used to determine model equivalence. Our metric systematically removes the least likely events from both HMMs until the remaining models are statistically equivalent. This defines the distance between models. We extend the use of HMMs to PCFGs, which have more expressive power. We estimate PCFG production probabilities from data. A statistical test is used for detection. We present three applications of HMM and PCFG detection to network traffic analysis. First, we infer the presence of protocol tunneling through Tor (the onion router) anonymization network. The Markov metric quantifies the similarity of network traffic HMMs in Tor to identify the protocol. It also measures communication noise in Tor network. We use HMMs to detect centralized botnet traffic. We infer HMMs from botnet traffic data and detect botnet infections. Experimental results show that HMMs can accurately detect Zeus botnet traffic. To hide their locations better, newer botnets have P2P control structures. Hierarchical P2P botnets contain recursive and hierarchical patterns. We use PCFGs to detect P2P botnet traffic. Experimentation on real-world traffic data shows that PCFGs can accurately differentiate between P2P botnet traffic and normal Internet traffic

    A review of spam email detection: analysis of spammer strategies and the dataset shift problem

    Get PDF
    .Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.SIPublicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    Cybercrime precursors: towards a model of offender resources

    Get PDF
    This thesis applies Ekblom and Tilley’s concept of offender resources to the study of criminal behaviour on the Internet. Offender predispositions are influenced by situational, that is the environmental incentives to commit crime. This thesis employs non-participation observation of online communities involved in activities linked to malicious forms of software. Actual online conversations are reproduced, providing rich ethnographic detail of activities that have taken place between 2008 and 2012 from eight discussion forums where malicious software and cases of hacking are openly discussed among actors. A purposeful sample of key frontline cybercrime responders (N=12) were interviewed about crimeware and their views of the activity observed in the discussion forums. Based on the empirical data, this thesis tests a number of criminological theories and assesses their relative compatibility with social interactions occurring in various online forum sites frequented by persons interested in the formation and use of malicious code. The thesis illustrates three conceptual frameworks of offender resources, based on different criminological theories. The first model ties ‘offender resources’ to the actual offender, suggesting that certain malicious software and its associated activities derive from the decisions, knowledge and abilities of the individual agent. The second model submits that ‘offender resources’ should be viewed more as a pathway leading to offending behaviour that must be instilled and then indoctrinated over a length of time through social interaction with other offenders. The third model emphasises the complex relationships that constitute or interconnect with ‘offender resources’ such as the nexus of relevant social groups and institutions in society. These include the Internet security industry, the law, and organised crime. Cybercrime is facilitated by crimeware, a specific type of computer software, and a focus on this element can help better understand how cybercrime evolves

    Cyber Security and Critical Infrastructures

    Get PDF
    This book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles: an editorial explaining current challenges, innovative solutions, real-world experiences including critical infrastructure, 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems, and a review of cloud, edge computing, and fog's security and privacy issues

    Traffic microstructures and network anomaly detection

    Get PDF
    Much hope has been put in the modelling of network traffic with machine learning methods to detect previously unseen attacks. Many methods rely on features on a microscopic level such as packet sizes or interarrival times to identify reoccurring patterns and detect deviations from them. However, the success of these methods depends both on the quality of corresponding training and evaluation data as well as the understanding of the structures that methods learn. Currently, the academic community is lacking both, with widely used synthetic datasets facing serious problems and the disconnect between methods and data being named the "semantic gap". This thesis provides extensive examinations of the necessary requirements on traffic generation and microscopic traffic structures to enable the effective training and improvement of anomaly detection models. We first present and examine DetGen, a container-based traffic generation paradigm that enables precise control and ground truth information over factors that shape traffic microstructures. The goal of DetGen is to provide researchers with extensive ground truth information and enable the generation of customisable datasets that provide realistic structural diversity. DetGen was designed according to four specific traffic requirements that dataset generation needs to fulfil to enable machine-learning models to learn accurate and generalisable traffic representations. Current network intrusion datasets fail to meet these requirements, which we believe is one of the reasons for the lacking success of anomaly-based detection methods. We demonstrate the significance of these requirements experimentally by examining how model performance decreases when these requirements are not met. We then focus on the control and information over traffic microstructures that DetGen provides, and the corresponding benefits when examining and improving model failures for overall model development. We use three metrics to demonstrate that DetGen is able to provide more control and isolation over the generated traffic. The ground truth information DetGen provides enables us to probe two state-of-the-art traffic classifiers for failures on certain traffic structures, and the corresponding fixes in the model design almost halve the number of misclassifications . Drawing on these results, we propose CBAM, an anomaly detection model that detects network access attacks through deviations from reoccurring flow sequence patterns. CBAM is inspired by the design of self-supervised language models, and improves the AUC of current state-of-the-art by up to 140%. By understanding why several flow sequence structures present difficulties to our model, we make targeted design decisions that improve on these difficulties and ultimately boost the performance of our model. Lastly, we examine how the control and adversarial perturbation of traffic microstructures can be used by an attacker to evade detection. We show that in a stepping-stone attack, an attacker can evade every current detection model by mimicking the patterns observed in streaming services

    Developing Efficient and Effective Intrusion Detection System using Evolutionary Computation

    Get PDF
    The internet and computer networks have become an essential tool in distributed computing organisations especially because they enable the collaboration between components of heterogeneous systems. The efficiency and flexibility of online services have attracted many applications, but as they have grown in popularity so have the numbers of attacks on them. Thus, security teams must deal with numerous threats where the threat landscape is continuously evolving. The traditional security solutions are by no means enough to create a secure environment, intrusion detection systems (IDSs), which observe system works and detect intrusions, are usually utilised to complement other defence techniques. However, threats are becoming more sophisticated, with attackers using new attack methods or modifying existing ones. Furthermore, building an effective and efficient IDS is a challenging research problem due to the environment resource restrictions and its constant evolution. To mitigate these problems, we propose to use machine learning techniques to assist with the IDS building effort. In this thesis, Evolutionary Computation (EC) algorithms are empirically investigated for synthesising intrusion detection programs. EC can construct programs for raising intrusion alerts automatically. One novel proposed approach, i.e. Cartesian Genetic Programming, has proved particularly effective. We also used an ensemble-learning paradigm, in which EC algorithms were used as a meta-learning method to produce detectors. The latter is more fully worked out than the former and has proved a significant success. An efficient IDS should always take into account the resource restrictions of the deployed systems. Memory usage and processing speed are critical requirements. We apply a multi-objective approach to find trade-offs among intrusion detection capability and resource consumption of programs and optimise these objectives simultaneously. High complexity and the large size of detectors are identified as general issues with the current approaches. The multi-objective approach is used to evolve Pareto fronts for detectors that aim to maintain the simplicity of the generated patterns. We also investigate the potential application of these algorithms to detect unknown attacks
    corecore