10 research outputs found

    Cloud Computing Security, An Intrusion Detection System for Cloud Computing Systems

    Get PDF
    Cloud computing is widely considered as an attractive service model because it minimizes investment since its costs are in direct relation to usage and demand. However, the distributed nature of cloud computing environments, their massive resource aggregation, wide user access and efficient and automated sharing of resources enable intruders to exploit clouds for their advantage. To combat intruders, several security solutions for cloud environments adopt Intrusion Detection Systems. However, most IDS solutions are not suitable for cloud environments, because of problems such as single point of failure, centralized load, high false positive alarms, insufficient coverage for attacks, and inflexible design. The thesis defines a framework for a cloud based IDS to face the deficiencies of current IDS technology. This framework deals with threats that exploit vulnerabilities to attack the various service models of a cloud system. The framework integrates behaviour based and knowledge based techniques to detect masquerade, host, and network attacks and provides efficient deployments to detect DDoS attacks. This thesis has three main contributions. The first is a Cloud Intrusion Detection Dataset (CIDD) to train and test an IDS. The second is the Data-Driven Semi-Global Alignment, DDSGA, approach and three behavior based strategies to detect masquerades in cloud systems. The third and final contribution is signature based detection. We introduce two deployments, a distributed and a centralized one to detect host, network, and DDoS attacks. Furthermore, we discuss the integration and correlation of alerts from any component to build a summarized attack report. The thesis describes in details and experimentally evaluates the proposed IDS and alternative deployments. Acknowledgment: =============== • This PH.D. is achieved through an international joint program with a collaboration between University of Pisa in Italy (Department of Computer Science, Galileo Galilei PH.D. School) and University of Arizona in USA (College of Electrical and Computer Engineering). • The PHD topic is categorized in both Computer Engineering and Information Engineering topics. • The thesis author is also known as "Hisham A. Kholidy"

    Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

    Forensic identification and detection of hidden and obfuscated malware

    Get PDF
    The revolution in online criminal activities and malicious software (malware) has posed a serious challenge in malware forensics. Malicious attacks have become more organized and purposefully directed. With cybercrimes escalating to great heights in quantity as well as in sophistication and stealth, the main challenge is to detect hidden and obfuscated malware. Malware authors use a variety of obfuscation methods and specialized stealth techniques of information hiding to embed malicious code, to infect systems and to thwart any attempt to detect them, specifically with the use of commercially available anti-malware engines. This has led to the situation of zero-day attacks, where malware inflict systems even with existing security measures. The aim of this thesis is to address this situation by proposing a variety of novel digital forensic and data mining techniques to automatically detect hidden and obfuscated malware. Anti-malware engines use signature matching to detect malware where signatures are generated by human experts by disassembling the file and selecting pieces of unique code. Such signature based detection works effectively with known malware but performs poorly with hidden or unknown malware. Code obfuscation techniques, such as packers, polymorphism and metamorphism, are able to fool current detection techniques by modifying the parent code to produce offspring copies resulting in malware that has the same functionality, but with a different structure. These evasion techniques exploit the drawbacks of traditional malware detection methods, which take current malware structure and create a signature for detecting this malware in the future. However, obfuscation techniques aim to reduce vulnerability to any kind of static analysis to the determent of any reverse engineering process. Furthermore, malware can be hidden in file system slack space, inherent in NTFS file system based partitions, resulting in malware detection that even more difficult.Doctor of Philosoph

    Machine learning for network based intrusion detection : an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Signal processing for malware analysis

    Get PDF
    This Project is an experimental analysis of Android malware through images. The analysis is based on classifying the malware into families or differentiating between goodware and malware. This analysis has been done considering two approaches. These two approaches have a common starting point, which is the transformation of Android applications into PNG images. After this conversion, the first approach was subtracting each image from the testing set with the images of the training set, in order to establish which unknown malware belongs to a specific family or to distinguish between goodware and malware. Although the accuracy was higher than the one defined in the requirements, this approach was a time consuming task, so we consider another approach to reduce the time and get the same or better accuracy. The second approach was extracting features from all the images and then using a machine learning classifier to get a precise differentiation. After this second approach, the resulting time for 100,000 samples was less than 4 hours and the accuracy 83.04%, which fulfill the requirements specified. To perform the analysis, we have used two heterogeneous datasets. The Malgenome dataset which contains 49 kinds of malware Android applications (49 malware families). It was used to perform the measurements and the different tests. The M0droid dataset, which contains goodware and malware Android applications. It was used to corroborate the previous analysis.Este proyecto es un análisis experimental de aplicaciones de Android mediante imágenes. Este análisis se basa en clasificar las imágenes en familias o en diferenciarlas entre goodware o malware. Para ello, se han considerado dos enfoques. Estas dos aproximaciones tienen como punto en común la transformación de las aplicaciones de Android en imágenes de tipo PNG. Después de este proceso de transformación a imágenes, la primera aproximación se basó en restar cada imagen perteneciente al grupo de pruebas con las imágenes del grupo de entrenamiento, de esta forma se pudo saber la familia a la que pertenecía cada malware desconocido o distinguir entre aplicaciones goodware y malware. Sin embargo, a pesar de que la precisión de acierto era más alta que la definida en los requisitos, este enfoque era una tarea que consumía mucho tiempo, así que consideramos otra aproximación para reducir el tiempo y conseguir una precisión parecida o mejor que la anterior. Este segundo enfoque fue extraer las características de las imágenes para después usar un clasificador y así obtener una diferenciación precisa. Con esta segunda aproximación, conseguimos un tiempo total menor a las 4 horas para 100000 muestras con una precisión del 83.04%, cumpliendo y superando de esta forma los requisitos que habían sido especificados. Este análisis se ha llevado a cabo usando dos sets de datos heterogéneos. Uno de ellos fue el perteneciente a un proyecto llamado Malgenome, éste contiene 49 tipos de familias de malware en Android. El set de datos de Malgenome se usó para realizar los diferentes ensayos o pruebas y sobre el que se realizaron las medidas de tiempo y precisión. El set de datos de M0droid se usó para corroborar el análisis previo y así establecer una clasificación final.Ingeniería Informátic

    Behavioural Monitoring via Network Communications

    Get PDF
    It is commonly acknowledged that using Internet applications is an integral part of an individual’s everyday life, with more than three billion users now using Internet services across the world; and this number is growing every year. Unfortunately, with this rise in Internet use comes an increasing rise in cyber-related crime. Whilst significant effort has been expended on protecting systems from outside attack, only more recently have researchers sought to develop countermeasures against insider attack. However, for an organisation, the detection of an attack is merely the start of a process that requires them to investigate and attribute the attack to an individual (or group of individuals). The investigation of an attack typically revolves around the analysis of network traffic, in order to better understand the nature of the traffic flows and importantly resolves this to an IP address of the insider. However, with mobile computing and Dynamic Host Control Protocol (DHCP), which results in Internet Protocol (IP) addresses changing frequently, it is particularly challenging to resolve the traffic back to a specific individual. The thesis explores the feasibility of profiling network traffic in a biometric-manner in order to be able to identify users independently of the IP address. In order to maintain privacy and the issue of encryption (which exists on an increasing volume of network traffic), the proposed approach utilises data derived only from the metadata of packets, not the payload. The research proposed a novel feature extraction approach focussed upon extracting user-oriented application-level features from the wider network traffic. An investigation across nine of the most common web applications (Facebook, Twitter, YouTube, Dropbox, Google, Outlook, Skype, BBC and Wikipedia) was undertaken to determine whether such high-level features could be derived from the low-level network signals. The results showed that whilst some user interactions were not possible to extract due to the complexities of the resulting web application, a majority of them were. Having developed a feature extraction process that focussed more upon the user, rather than machine-to-machine traffic, the research sought to use this information to determine whether a behavioural profile could be developed to enable identification of the users. Network traffic of 27 users over 2 months was collected and processed using the aforementioned feature extraction process. Over 140 million packets were collected and processed into 45 user-level interactions across the nine applications. The results from behavioural profiling showed that the system is capable of identifying users, with an average True Positive Identification Rate (TPIR) in the top three applications of 87.4%, 75% and 61.9% respectively. Whilst the initial study provided some encouraging results, the research continued to develop further refinements which could improve the performance. Two techniques were applied, fusion and timeline analysis techniques. The former approach sought to fuse the output of the classification stage to better incorporate and manage the variability of the classification and resulting decision phases of the biometric system. The latter approach sought to capitalise on the fact that whilst the IP address is not reliable over a period of time due to reallocation, over shorter timeframes (e.g. a few minutes) it is likely to reliable and map to the same user. The results for fusion across the top three applications were 93.3%, 82.5% and 68.9%. The overall performance adding in the timeline analysis (with a 240 second time window) on average across all applications was 72.1%. Whilst in terms of biometric identification in the normal sense, 72.1% is not outstanding, its use within this problem of attributing misuse to an individual provides the investigator with an enormous advantage over existing approaches. At best, it will provide him with a user’s specific traffic and at worst allow them to significantly reduce the volume of traffic to be analysed

    Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks

    Get PDF
    According to the report published by the online protection firm Iovation in 2012, cyber fraud ranged from 1 percent of the Internet transactions in North America Africa to a 7 percent in Africa, most of them involving credit card fraud, identity theft, and account takeover or h¼acking attempts. This kind of crime is still growing due to the advantages offered by a non face-to-face channel where a increasing number of unsuspecting victims divulges sensitive information. Interpol classifies these illegal activities into 3 types: • Attacks against computer hardware and software. • Financial crimes and corruption. • Abuse, in the form of grooming or “sexploitation”. Most research efforts have been focused on the target of the crime developing different strategies depending on the casuistic. Thus, for the well-known phising, stored blacklist or crime signals through the text are employed eventually designing adhoc detectors hardly conveyed to other scenarios even if the background is widely shared. Identity theft or masquerading can be described as a criminal activity oriented towards the misuse of those stolen credentials to obtain goods or services by deception. On March 4, 2005, a million of personal and sensitive information such as credit card and social security numbers was collected by White Hat hackers at Seattle University who just surfed the Web for less than 60 minutes by means of the Google search engine. As a consequence they proved the vulnerability and lack of protection with a mere group of sophisticated search terms typed in the engine whose large data warehouse still allowed showing company or government websites data temporarily cached. As aforementioned, platforms to connect distant people in which the interaction is undirected pose a forcible entry for unauthorized thirds who impersonate the licit user in a attempt to go unnoticed with some malicious, not necessarily economic, interests. In fact, the last point in the list above regarding abuses has become a major and a terrible risk along with the bullying being both by means of threats, harassment or even self-incrimination likely to drive someone to suicide, depression or helplessness. California Penal Code Section 528.5 states: “Notwithstanding any other provision of law, any person who knowingly and without consent credibly impersonates another actual person through or on an Internet Web site or by other electronic means for purposes of harming, intimidating, threatening, or defrauding another person is guilty of a public offense punishable pursuant to subdivision [...]”. IV Therefore, impersonation consists of any criminal activity in which someone assumes a false identity and acts as his or her assumed character with intent to get a pecuniary benefit or cause some harm. User profiling, in turn, is the process of harvesting user information in order to construct a rich template with all the advantageous attributes in the field at hand and with specific purposes. User profiling is often employed as a mechanism for recommendation of items or useful information which has not yet considered by the client. Nevertheless, deriving user tendency or preferences can be also exploited to define the inherent behavior and address the problem of impersonation by detecting outliers or strange deviations prone to entail a potential attack. This dissertation is meant to elaborate on impersonation attacks from a profiling perspective, eventually developing a 2-stage environment which consequently embraces 2 levels of privacy intrusion, thus providing the following contributions: • The inference of behavioral patterns from the connection time traces aiming at avoiding the usurpation of more confidential information. When compared to previous approaches, this procedure abstains from impinging on the user privacy by taking over the messages content, since it only relies on time statistics of the user sessions rather than on their content. • The application and subsequent discussion of two selected algorithms for the previous point resolution: – A commonly employed supervised algorithm executed as a binary classifier which thereafter has forced us to figure out a method to deal with the absence of labeled instances representing an identity theft. – And a meta-heuristic algorithm in the search for the most convenient parameters to array the instances within a high dimensional space into properly delimited clusters so as to finally apply an unsupervised clustering algorithm. • The analysis of message content encroaching on more private information but easing the user identification by mining discriminative features by Natural Language Processing (NLP) techniques. As a consequence, the development of a new feature extraction algorithm based on linguistic theories motivated by the massive quantity of features often gathered when it comes to texts. In summary, this dissertation means to go beyond typical, ad-hoc approaches adopted by previous identity theft and authorship attribution research. Specifically it proposes tailored solutions to this particular and extensively studied paradigm with the aim at introducing a generic approach from a profiling view, not tightly bound to a unique application field. In addition technical contributions have been made in the course of the solution formulation intending to optimize familiar methods for a better versatility towards the problem at hand. In summary: this Thesis establishes an encouraging research basis towards unveiling subtle impersonation attacks in Social Networks by means of intelligent learning techniques

    Detecting masquerades using a combination of Naïve Bayes and weighted RBF approach

    No full text
    Masquerade detection by automated means is gaining widespread interest due to the serious impact of masquerades on computer system or network. Several techniques have been introduced in an effort to minimize up to some extent the risk associated with masquerade attack. In this respect, we have developed a novel technique which comprises of Naïve Bayes approach and weighted radial basis function similarity approach. The proposed scheme exhibits very promising results in comparison with many earlier techniques while experimenting on SEA dataset in detecting masquerades

    ORIGINAL PAPER Detecting masquerades using a combination of Naïve Bayes and weighted RBF approach

    No full text
    Abstract Masquerade detection by automated means is gaining widespread interest due to the serious impact of masquerades on computer system or network. Several techniques have been introduced in an effort to minimize up to some extent the risk associated with masquerade attack. In this respect, we have developed a novel technique which comprises of Naïve Bayes approach and weighted radial basis function similarity approach. The proposed scheme exhibits very promising results in comparison with many earlier techniques while experimenting on SEA dataset in detecting masquerades.
    corecore