10 research outputs found
Cloud Computing Security, An Intrusion Detection System for Cloud Computing Systems
Cloud computing is widely considered as an attractive service model because it minimizes investment since its costs are in direct relation to usage and demand. However, the distributed nature of cloud computing environments, their massive resource aggregation, wide user access and efficient and automated sharing of resources enable intruders to exploit clouds for their advantage. To combat intruders, several security solutions for cloud environments adopt Intrusion Detection Systems. However, most IDS solutions are not suitable for cloud environments, because of problems such as single point of failure, centralized load, high false positive alarms, insufficient coverage for attacks, and inflexible design. The thesis defines a framework for a cloud based IDS to face the deficiencies of current IDS technology. This framework deals with threats that exploit vulnerabilities to attack the various service models of a cloud system. The framework integrates behaviour based and knowledge based techniques to detect masquerade, host, and network attacks and provides efficient deployments to detect DDoS attacks.
This thesis has three main contributions. The first is a Cloud Intrusion Detection Dataset (CIDD) to train and test an IDS. The second is the Data-Driven Semi-Global Alignment, DDSGA, approach and three behavior based strategies to detect masquerades in cloud systems. The third and final contribution is signature based detection. We introduce two deployments, a distributed and a centralized one to detect host, network, and DDoS attacks. Furthermore, we discuss the integration and correlation of alerts from any component to build a summarized attack report. The thesis describes in details and experimentally evaluates the proposed IDS and alternative deployments.
Acknowledgment:
===============
• This PH.D. is achieved through an international joint program with a collaboration between University of Pisa in Italy (Department of Computer Science, Galileo Galilei PH.D. School) and University of Arizona in USA (College of Electrical and Computer Engineering).
• The PHD topic is categorized in both Computer Engineering and Information Engineering topics.
• The thesis author is also known as "Hisham A. Kholidy"
Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.
For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack
of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical
investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes
of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained
whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from
imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective
GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions
Forensic identification and detection of hidden and obfuscated malware
The revolution in online criminal activities and malicious software (malware) has posed a serious challenge in malware forensics. Malicious attacks have become more organized and purposefully directed. With cybercrimes escalating to great heights in quantity as well as in sophistication and stealth, the main challenge is to detect hidden and obfuscated malware. Malware authors use a variety of obfuscation methods and specialized stealth techniques of information hiding to embed malicious code, to infect systems and to thwart any attempt to detect them, specifically with the use of commercially available anti-malware engines. This has led to the situation of zero-day attacks, where malware inflict systems even with existing security measures. The aim of this thesis is to address this situation by proposing a variety of novel digital forensic and data mining techniques to automatically detect hidden and obfuscated malware. Anti-malware engines use signature matching to detect malware where signatures are generated by human experts by disassembling the file and selecting pieces of unique code. Such signature based detection works effectively with known malware but performs poorly with hidden or unknown malware. Code obfuscation techniques, such as packers, polymorphism and metamorphism, are able to fool current detection techniques by modifying the parent code to produce offspring copies resulting in malware that has the same functionality, but with a different structure. These evasion techniques exploit the drawbacks of traditional malware detection methods, which take current malware structure and create a signature for detecting this malware in the future. However, obfuscation techniques aim to reduce vulnerability to any kind of static analysis to the determent of any reverse engineering process. Furthermore, malware can be hidden in file system slack space, inherent in NTFS file system based partitions, resulting in malware detection that even more difficult.Doctor of Philosoph
Machine learning for network based intrusion detection : an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data
For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Signal processing for malware analysis
This Project is an experimental analysis of Android malware through images. The analysis
is based on classifying the malware into families or differentiating between goodware and
malware. This analysis has been done considering two approaches. These two
approaches have a common starting point, which is the transformation of Android
applications into PNG images. After this conversion, the first approach was subtracting
each image from the testing set with the images of the training set, in order to establish
which unknown malware belongs to a specific family or to distinguish between goodware
and malware. Although the accuracy was higher than the one defined in the
requirements, this approach was a time consuming task, so we consider another
approach to reduce the time and get the same or better accuracy. The second approach
was extracting features from all the images and then using a machine learning classifier
to get a precise differentiation. After this second approach, the resulting time for 100,000
samples was less than 4 hours and the accuracy 83.04%, which fulfill the requirements
specified.
To perform the analysis, we have used two heterogeneous datasets. The Malgenome
dataset which contains 49 kinds of malware Android applications (49 malware families). It
was used to perform the measurements and the different tests. The M0droid dataset,
which contains goodware and malware Android applications. It was used to corroborate
the previous analysis.Este proyecto es un análisis experimental de aplicaciones de Android mediante
imágenes. Este análisis se basa en clasificar las imágenes en familias o en diferenciarlas
entre goodware o malware. Para ello, se han considerado dos enfoques. Estas dos
aproximaciones tienen como punto en común la transformación de las aplicaciones de
Android en imágenes de tipo PNG. Después de este proceso de transformación a
imágenes, la primera aproximación se basó en restar cada imagen perteneciente al
grupo de pruebas con las imágenes del grupo de entrenamiento, de esta forma se pudo
saber la familia a la que pertenecía cada malware desconocido o distinguir entre
aplicaciones goodware y malware. Sin embargo, a pesar de que la precisión de acierto
era más alta que la definida en los requisitos, este enfoque era una tarea que consumía
mucho tiempo, así que consideramos otra aproximación para reducir el tiempo y
conseguir una precisión parecida o mejor que la anterior. Este segundo enfoque fue
extraer las características de las imágenes para después usar un clasificador y así
obtener una diferenciación precisa. Con esta segunda aproximación, conseguimos un
tiempo total menor a las 4 horas para 100000 muestras con una precisión del 83.04%,
cumpliendo y superando de esta forma los requisitos que habían sido especificados.
Este análisis se ha llevado a cabo usando dos sets de datos heterogéneos. Uno de ellos
fue el perteneciente a un proyecto llamado Malgenome, éste contiene 49 tipos de
familias de malware en Android. El set de datos de Malgenome se usó para realizar los
diferentes ensayos o pruebas y sobre el que se realizaron las medidas de tiempo y
precisión. El set de datos de M0droid se usó para corroborar el análisis previo y así
establecer una clasificación final.Ingeniería Informátic
Behavioural Monitoring via Network Communications
It is commonly acknowledged that using Internet applications is an integral part of an individual’s everyday life, with more than three billion users now using Internet services across the world; and this number is growing every year. Unfortunately, with this rise in Internet use comes an increasing rise in cyber-related crime. Whilst significant effort has been expended on protecting systems from outside attack, only more recently have researchers sought to develop countermeasures against insider attack. However, for an organisation, the detection of an attack is merely the start of a process that requires them to investigate and attribute the attack to an individual (or group of individuals).
The investigation of an attack typically revolves around the analysis of network traffic, in order to better understand the nature of the traffic flows and importantly resolves this to an IP address of the insider. However, with mobile computing and Dynamic Host Control Protocol (DHCP), which results in Internet Protocol (IP) addresses changing frequently, it is particularly challenging to resolve the traffic back to a specific individual.
The thesis explores the feasibility of profiling network traffic in a biometric-manner in order to be able to identify users independently of the IP address. In order to maintain privacy and the issue of encryption (which exists on an increasing volume of network traffic), the proposed approach utilises data derived only from the metadata of packets, not the payload. The research proposed a novel feature extraction approach focussed upon extracting user-oriented application-level features from the wider network traffic. An investigation across nine of the most common web applications (Facebook, Twitter, YouTube, Dropbox, Google, Outlook, Skype, BBC and Wikipedia) was undertaken to determine whether such high-level features could be derived from the low-level network signals. The results showed that whilst some user interactions were not possible to extract due to the complexities of the resulting web application, a majority of them were.
Having developed a feature extraction process that focussed more upon the user, rather than machine-to-machine traffic, the research sought to use this information to determine whether a behavioural profile could be developed to enable identification of the users. Network traffic of 27 users over 2 months was collected and processed using the aforementioned feature extraction process. Over 140 million packets were collected and processed into 45 user-level interactions across the nine applications. The results from behavioural profiling showed that the system is capable of identifying users, with an average True Positive Identification Rate (TPIR) in the top three applications of 87.4%, 75% and 61.9% respectively.
Whilst the initial study provided some encouraging results, the research continued to develop further refinements which could improve the performance. Two techniques were applied, fusion and timeline analysis techniques. The former approach sought to fuse the output of the classification stage to better incorporate and manage the variability of the classification and resulting decision phases of the biometric system. The latter approach sought to capitalise on the fact that whilst the IP address is not reliable over a period of time due to reallocation, over shorter timeframes (e.g. a few minutes) it is likely to reliable and map to the same user. The results for fusion across the top three applications were 93.3%, 82.5% and 68.9%. The overall performance adding in the timeline analysis (with a 240 second time window) on average across all applications was 72.1%.
Whilst in terms of biometric identification in the normal sense, 72.1% is not outstanding, its use within this problem of attributing misuse to an individual provides the investigator with an enormous advantage over existing approaches. At best, it will provide him with a user’s specific traffic and at worst allow them to significantly reduce the volume of traffic to be analysed
Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks
According to the report published by the online protection firm Iovation in 2012,
cyber fraud ranged from 1 percent of the Internet transactions in North America
Africa to a 7 percent in Africa, most of them involving credit card fraud, identity
theft, and account takeover or h¼acking attempts. This kind of crime is still growing
due to the advantages offered by a non face-to-face channel where a increasing
number of unsuspecting victims divulges sensitive information. Interpol classifies
these illegal activities into 3 types:
• Attacks against computer hardware and software.
• Financial crimes and corruption.
• Abuse, in the form of grooming or “sexploitation”.
Most research efforts have been focused on the target of the crime developing different
strategies depending on the casuistic. Thus, for the well-known phising, stored
blacklist or crime signals through the text are employed eventually designing adhoc
detectors hardly conveyed to other scenarios even if the background is widely
shared. Identity theft or masquerading can be described as a criminal activity oriented
towards the misuse of those stolen credentials to obtain goods or services by
deception. On March 4, 2005, a million of personal and sensitive information such
as credit card and social security numbers was collected by White Hat hackers at
Seattle University who just surfed the Web for less than 60 minutes by means of
the Google search engine. As a consequence they proved the vulnerability and lack
of protection with a mere group of sophisticated search terms typed in the engine
whose large data warehouse still allowed showing company or government websites
data temporarily cached.
As aforementioned, platforms to connect distant people in which the interaction is
undirected pose a forcible entry for unauthorized thirds who impersonate the licit
user in a attempt to go unnoticed with some malicious, not necessarily economic,
interests. In fact, the last point in the list above regarding abuses has become a
major and a terrible risk along with the bullying being both by means of threats,
harassment or even self-incrimination likely to drive someone to suicide, depression
or helplessness. California Penal Code Section 528.5 states:
“Notwithstanding any other provision of law, any person who knowingly
and without consent credibly impersonates another actual person through
or on an Internet Web site or by other electronic means for purposes of
harming, intimidating, threatening, or defrauding another person is guilty
of a public offense punishable pursuant to subdivision [...]”.
IV
Therefore, impersonation consists of any criminal activity in which someone assumes
a false identity and acts as his or her assumed character with intent to get
a pecuniary benefit or cause some harm. User profiling, in turn, is the process of
harvesting user information in order to construct a rich template with all the advantageous
attributes in the field at hand and with specific purposes. User profiling is
often employed as a mechanism for recommendation of items or useful information
which has not yet considered by the client. Nevertheless, deriving user tendency or
preferences can be also exploited to define the inherent behavior and address the
problem of impersonation by detecting outliers or strange deviations prone to entail
a potential attack.
This dissertation is meant to elaborate on impersonation attacks from a profiling
perspective, eventually developing a 2-stage environment which consequently embraces
2 levels of privacy intrusion, thus providing the following contributions:
• The inference of behavioral patterns from the connection time traces aiming at
avoiding the usurpation of more confidential information. When compared to
previous approaches, this procedure abstains from impinging on the user privacy
by taking over the messages content, since it only relies on time statistics
of the user sessions rather than on their content.
• The application and subsequent discussion of two selected algorithms for the
previous point resolution:
– A commonly employed supervised algorithm executed as a binary classifier
which thereafter has forced us to figure out a method to deal with the
absence of labeled instances representing an identity theft.
– And a meta-heuristic algorithm in the search for the most convenient parameters
to array the instances within a high dimensional space into properly
delimited clusters so as to finally apply an unsupervised clustering
algorithm.
• The analysis of message content encroaching on more private information but
easing the user identification by mining discriminative features by Natural
Language Processing (NLP) techniques. As a consequence, the development of
a new feature extraction algorithm based on linguistic theories motivated by
the massive quantity of features often gathered when it comes to texts.
In summary, this dissertation means to go beyond typical, ad-hoc approaches
adopted by previous identity theft and authorship attribution research. Specifically
it proposes tailored solutions to this particular and extensively studied paradigm
with the aim at introducing a generic approach from a profiling view, not tightly
bound to a unique application field. In addition technical contributions have been
made in the course of the solution formulation intending to optimize familiar methods
for a better versatility towards the problem at hand. In summary: this Thesis
establishes an encouraging research basis towards unveiling subtle impersonation
attacks in Social Networks by means of intelligent learning techniques
Detecting masquerades using a combination of Naïve Bayes and weighted RBF approach
Masquerade detection by automated means is gaining widespread interest due to the serious impact of masquerades on computer system or network. Several techniques have been introduced in an effort to minimize up to some extent the risk associated with masquerade attack. In this respect, we have developed a novel technique which comprises of Naïve Bayes approach and weighted radial basis function similarity approach. The proposed scheme exhibits very promising results in comparison with many earlier techniques while experimenting on SEA dataset in detecting masquerades
ORIGINAL PAPER Detecting masquerades using a combination of Naïve Bayes and weighted RBF approach
Abstract Masquerade detection by automated means is gaining widespread interest due to the serious impact of masquerades on computer system or network. Several techniques have been introduced in an effort to minimize up to some extent the risk associated with masquerade attack. In this respect, we have developed a novel technique which comprises of Naïve Bayes approach and weighted radial basis function similarity approach. The proposed scheme exhibits very promising results in comparison with many earlier techniques while experimenting on SEA dataset in detecting masquerades.