151 research outputs found
Comparing Anomaly-Based Network Intrusion Detection Approaches Under Practical Aspects
While many of the currently used network intrusion detection systems (NIDS) employ signature-based approaches, there is an increasing research interest in the examination of anomaly-based detection methods, which seem to be more suited for recognizing zero-day attacks. Nevertheless, requirements for their practical deployment, as well as objective and reproducible evaluation methods, are hereby often neglected. The following thesis defines aspects that are crucial for a practical evaluation of anomaly-based NIDS, such as the focus on modern attack types, the restriction to one-class classification methods, the exclusion of known attacks from the training phase, a low false detection rate, and consideration of the runtime efficiency. Based on those principles, a framework dedicated to developing, testing and evaluating models for the detection of network anomalies is proposed. It is applied to two datasets featuring modern traffic, namely the UNSW-NB15 and the CIC-IDS-2017 datasets, in order to compare and evaluate commonly-used network intrusion detection methods. The implemented approaches include, among others, a highly configurable network flow generator, a payload analyser, a one-hot encoder, a one-class support vector machine, and an autoencoder. The results show a significant difference between the two chosen datasets: While for the UNSW-NB15 dataset several reasonably well performing model combinations for both the autoencoder and the one-class SVM can be found, most of them yield unsatisfying results when the CIC-IDS-2017 dataset is used.Obwohl viele der derzeit genutzten Systeme zur Erkennung von Netzwerkangriffen (engl. NIDS) signaturbasierte Ansätze verwenden, gibt es ein wachsendes Forschungsinteresse an der Untersuchung von anomaliebasierten Erkennungsmethoden, welche zur Identifikation von Zero-Day-Angriffen geeigneter erscheinen. Gleichwohl werden hierbei Bedingungen für deren praktischen
Einsatz oft vernachlässigt, ebenso wie objektive und reproduzierbare Evaluationsmethoden. Die folgende Arbeit definiert Aspekte, die für eine praxisorientierte Evaluation unabdingbar sind. Dazu zählen ein Schwerpunkt auf modernen Angriffstypen, die Beschränkung auf One-Class Classification Methoden, der Ausschluss von bereits bekannten Angriffen aus dem Trainingsdatensatz,
niedrige Falscherkennungsraten sowie die Berücksichtigung der Laufzeiteffizienz. Basierend auf diesen Prinzipien wird ein Rahmenkonzept vorgeschlagen, das für das Entwickeln, Testen und Evaluieren von Modellen zur Erkennung von Netzwerkanomalien bestimmt ist. Dieses wird auf zwei Datensätze mit modernem Netzwerkverkehr, namentlich auf den UNSW-NB15 und den CIC-IDS-
2017 Datensatz, angewendet, um häufig genutzte NIDS-Methoden zu vergleichen und zu evaluieren.
Die für diese Arbeit implementierten Ansätze beinhalten, neben anderen, einen weit konfigurierbaren Netzwerkflussgenerator, einen Nutzdatenanalysierer, einen One-Hot-Encoder, eine One-Class Support Vector Machine sowie einen Autoencoder. Die Resultate zeigen einen großen Unterschied zwischen den beiden ausgewählten Datensätzen: Während für den UNSW-NB15 Datensatz verschiedene angemessen gut funktionierende Modellkombinationen, sowohl für den Autoencoder als
auch für die One-Class SVM, gefunden werden können, bringen diese für den CIC-IDS-2017 Datensatz meist unbefriedigende Ergebnisse
A Machine Learning Approach for Intrusion Detection
Master's thesis in Information- and communication technology (IKT590)Securing networks and their confidentiality from intrusions is crucial, and for this rea-son, Intrusion Detection Systems have to be employed. The main goal of this thesis is to achieve a proper detection performance of a Network Intrusion Detection System (NIDS). In this thesis, we have examined the detection efficiency of machine learning algorithms such as Neural Network, Convolutional Neural Network, Random Forestand Long Short-Term Memory. We have constructed our models so that they can detect different types of attacks utilizing the CICIDS2017 dataset. We have worked on identifying 15 various attacks present in CICIDS2017, instead of merely identifying normal-abnormal traffic. We have also discussed the reason why to use precisely this dataset, and why should one classify by attack to enhance the detection. Previous works based on benchmark datasets such as NSL-KDD and KDD99 are discussed. Also, how to address and solve these issues. The thesis also shows how the results are effected using different machine learning algorithms. As the research will demon-strate, the Neural Network, Convulotional Neural Network, Random Forest and Long Short-Term Memory are evaluated by conducting cross validation; the average score across five folds of each model is at 92.30%, 87.73%, 94.42% and 87.94%, respectively. Nevertheless, the confusion metrics was also a crucial measurement to evaluate the models, as we shall see. Keywords: Information security, NIDS, Machine Learning, Neural Network, Convolutional Neural Network, Random Forest, Long Short-Term Memory, CICIDS2017
Application-Layer DoS attacks detection in Data-Plane-Programmable Networks using Machine Learning and P4
Application layer DDoS attacks pose a significant threat to network infrastructures, requiring effective detection mechanisms to mitigate their impact. This thesis proposes a novel solution that leverages programmable switches in software-defined networks (SDNs) and the P4 language to detect various types of application layer DDoS attacks, including slow-rate and high-rate attacks. The proposed solution involves mapping trained Random Forest models to programmable switches, enabling packet-level feature extraction and classification in the data plane. By efficiently identifying packet-flows and further categorizing the flows as either malicious or benign, the system achieves low-latency, high-throughput inference at line rate. This approach is particularly advantageous in data plane environments, where resources are highly constrained, and programmable switches have limited memory and minimal support for mathematical operations or data types. The research focuses on analyzing and enhancing different implementations of Random Forest classifiers in P4, considering memory and computational trade-offs. Furthermore, different strategies to enable hit-less model updates of the RF classifiers are implemented
Computational models of gene expression regulation
Throughout the last several decades, many efforts have been put into elucidating the genetic or epigenetic defects that result in various diseases. Gene regulation, i.e., the process of how genes are turned on and off in the right place and at the right time, is a paramount and prevailing question for researchers. Thanks to the discoveries made by researchers in this field, our understanding of interactions between proteins and DNA or proteins with themselves, as well as the dynamics of chromatin structure under different conditions, have substantially advanced. Even though there has been a lot achieved through these discoveries, there are still many unknown aspects about gene regulation. For instance, proteins called transcription factors (TFs) recognize and bind to specific regions of DNA and recruit the transcriptional machinery, which is essential for gene regulation. As there have been more than 2000 TFs identified in the human genome, it is important to study where they bind to or which genes they target. Computational approaches are important, in particular, as the biological experiments are often very expensive and cannot be done for all TFs. In 2016, a competition named DREAM Challenge was held encouraging researchers to develop novel computational tools for predicting the binding sites of several TFs. The first chapter of this thesis describes our machine learning approach to address this challenge within the scope of the competition. Using ensembles of random forest classifiers, we formulated our framework such that it is able to benefit from the tissue specificity inherent in the data leading to better generalization. Also, our models were tailored for spotting cofactors involved in the binding of TFs of interest. Comparing the important TFs that our computational models suggested with protein-protein association networks revealed that the models preferentially select motifs of TFs that are potential interaction partners in those networks. Another important aspect beyond predicting TF binding is to link epigeneomics, such as histone modification (HM) data, with gene expression. We, particularly, concentrated on predicting expression in a subset of genes called bidirectional. Bidirectional genes are referred to as pairs of genes that are located on opposite strands of DNA close to each other. As the sequencing technologies advance, more such bidirectional configurations are being detected. This indicates that in order to understand the gene regulatory mechanisms, it would be beneficial to account for such promoter architectures. In the second and third chapters, we focused on genes having bidirectional promoter architectures utilizing high resolution epigenomic signatures and single cell RNA-seq data to dissect the complex epigenetic architecture at these promoters. Using single-cell RNA-seq data as the estimate of gene expression, we were able to generate a hypothetical model for gene regulation in bidirectional promoters. We showed that bidirectional promoters can be categorized into three architecture types with distinct characteristics. Each of these categories corresponds to a unique gene expression profile at single cell level. The single cell RNA-seq data proved to be a powerful means for studying gene regulation. Therefore, in the last chapter, we proposed a novel approach for predicting gene expression at the single cell level using cis-regulatory motifs as well as epigenetic features. To achieve this, we designed a tree-guided multi-task learning framework that considers each cell as a task. Through this framework we were able to explain the single cell gene expression values using either TF binding affinities or TF ChIP-seq data measured at specific genomic regions. This allowed us to identify distinct TFs that show cell-type specific regulation in induced pluripotent stem cells. Our approach does not only limit to TFs, rather it can take any type of data that can potentially be used in explaining gene expression at single cell level. We believe that our findings can be used in drug discovery and development that can regulate the presence of TFs or other regulatory factors, which lead the cell fate into abnormal states, to prevent or cure diseases.In den letzten Jahrzehnten wurden große Anstrengungen unternommen, um die genetischen oder epigenetischen Defekte aufzuklären, die zu verschiedenen Krankheiten führen. Die Genregulation, d.h. der Prozess der Ein- und Abschaltung der Gene am richtigen Ort und zur richtigen Zeit reguliert, ist für die Forscher eine Frage von zentraler Bedeutung. Dank der Entdeckungen von Forschern auf diesem Gebiet ist unser Verständnis der Wechselwirkungen zwischen zwischen den Proteinen und der DNA oder der Proteine untereinander sowie der Dynamik der Chromatinstruktur unter verschiedenen Bedingungen wesentlich fortgeschritten. Obwohl durch diese Entdeckungen viel erreicht wurde, gibt es noch viele unbekannte Aspekte der Genregulation. Beispielsweise erkennen Proteine, sogenannte Transkriptionsfaktoren (Transcription Factors, TFs), bestimmte Bereiche der DNA und binden an diese und rekrutieren die Transkriptionsmaschinerie, die für die Genregulation erforderlich ist. Da mehr als 2000 TFs im menschlichen Genom identifiziert wurden, ist es wichtig zu untersuchen, wo sie binden oder auf welche Gene sie abzielen. Rechnerische Ansätze sind insbesondere wichtig, da die biologischen Experimente oft sehr teuer sind und nicht für alle TFs durchgeführt werden können. Im Jahr 2016 fand ein Wettbewerb namens DREAM Challenge statt, bei dem Forscher aufgefordert wurden, neuartige Rechenwerkzeuge zur Vorhersage der Bindungsstellen mehrerer TFs zu entwickeln. Das erste Kapitel dieser Arbeit beschreibt unseren Ansatz des maschinellen Lernens, um diese Herausforderung im Rahmen des Wettbewerbs anzugehen. Unter Verwendung von Ensembles von Random Forest Klassifikatoren haben wir unser Framework so formuliert, dass es von der Gewebespezifität der Daten profitiert und damit zu einer besseren Generalisierung führt. Außerdem wurden unsere Modelle auf das Erkennen von Kofaktoren angepasst, die an der Bindung von TFs beteiligt sind, die für uns von Interesse sind. Der Vergleich der wichtigen TFs, die unsere Computermodelle mit Protein-Protein-Assoziationsnetzwerken vorschlugen, ergab, dass die Modelle bevorzugt Motive von TFs auswählen, die potenzielle Interaktionspartner in diesen Netzwerken sind. Ein weiterer wichtiger Aspekt, der über die Vorhersage der TF-Bindung hinausgeht, besteht darin, epigeneomische Faktoren wie Histonmodifikationsdaten (HM-Daten) mit der Genexpression zu verknüpfen. Wir konzentrierten uns insbesondere auf die Vorhersage der Expression in einer Untergruppe von Genen, die als bidirektional bezeichnet werden. Bidirektionale Gene werden als Paare von Genen bezeichnet, die sich auf gegenüberliegenden DNA-Strängen befinden und nahe beieinander liegen. Mit dem Fortschritt der Sequenzierungstechnologien werden immer mehr solche bidirektionalen Konfigurationen erkannt. Dies weist darauf hin, dass es zum Verständnis der Genregulationsmechanismen vorteilhaft wäre, solche Promotorarchitekturen zu berücksichtigen. Im zweiten und dritten Kapitel konzentrierten wir uns auf Gene mit bidirektionalen Promotorarchitekturen, um mit Hilfe von epigenomischen Signaturen und Einzelzell-RNA-Sequenzdaten die komplexe epigenetische Architektur an diesen Promotoren zu analysieren. Unter Verwendung von Einzelzell-RNA-Sequenzdaten als Schätzung der Genexpression konnten wir ein hypothetisches Modell für die Genregulation in bidirektionalen Promotoren aufstellen. Wir haben gezeigt, dass bidirektionale Promotoren in drei Architekturtypen mit unterschiedlichen Merkmalen eingeteilt werden können. Jede dieser Kategorien entspricht einem eindeutigen Genexpressionsprofil auf Einzelzellebene. Die Einzelzell-RNA-Sequenzdaten erwiesen sich als leistungsstarkes Mittel zur Untersuchung der Genregulation. Daher haben wir im letzten Kapitel einen neuen Ansatz zur Vorhersage der Genexpression auf Einzelzellebene unter Verwendung von cis-regulatorischen Motiven sowie epigenetischen Merkmalen vorgeschlagen. Um dies zu erreichen, haben wir ein baumgesteuertes Multitasking-Lernsystem entwickelt, das jede Zelle als eine Aufgabe betrachtet. Durch dieses Gerüst konnten wir die Einzelzellgenexpressionswerte entweder mit TF-Bindungsaffinitäten oder mit TF-ChIP-Sequenzdaten erklären, die in bestimmten Genomregionen gemessen wurden. Dies ermöglichte es uns, verschiedene TFs zu identifizieren, die eine zelltypspezifische Regulation in induzierten pluripotenten Stammzellen zeigen. Unser Ansatz beschränkt sich nicht nur auf TFs, sondern kann jede Art von Daten verwenden, die potentiell zur Erklärung der Genexpression auf Einzelzellebene verwendet werden können. Wir glauben, dass unsere Erkenntnisse für die Entdeckung und Entwicklung von Arzneimitteln verwendet werden können, die das Vorhandensein von TFs oder anderen regulatorischen Faktoren regulieren können, die die Zellen abnormal werden lassen, um Krankheiten zu verhindern oder zu heilen
Performance Comparison of the Cogent Confabulation Classifier with Other Commonly Used Supervised Machine Learning Algorithms for Bathing Water Quality Assessment
The purpose of this study was to implement a reliable model for bathing water quality prediction using the Cogent Confabulation classifier and to compare it with other well-known classifiers. This study is a continuation of a previously published work and focuses on the areas of Kaštela Bay and the Brač Channel, located in the Republic of Croatia. The Cogent Confabulation classifier is a thorough and simple method for data classification based on the cogency measure for observed classes. To implement the model, we used data sets constructed of remote sensing data (band values) and in situ measurements presenting ground-truth bathing water quality. Satellite data was retrieved from the Sentinel-3 OLCI satellite and it was atmospherically corrected based on the characteristics and specifications of band wavelengths. The results showed that the Random Forest, K-Nearest Neighbour, and Decision Tree classifiers outperformed the Cogent Confabulation classifier. However, results showed that the Cogent Confabulation classifier achieved better results compared to classifiers based on Bayesian theory. Additionally, a qualitative analysis of the four best classifiers was conducted using spatial maps created in the QGIS tool
Towards control relevant system design and constrained order controller synthesis:with a thermal control application in immersion lithography
Intrusion detection system for IoT networks for detection of DDoS attacks
PhD ThesisIn this thesis, a novel Intrusion Detection System (IDS) based on the hybridization of the
Deep Learning (DL) technique and the Multi-objective Optimization method for the detection
of Distributed Denial of Service (DDoS) attacks in Internet of Things (IoT) networks is
proposed. IoT networks consist of different devices with unique hardware and software
configurations communicating over different communication protocols, which produce huge
multidimensional data that make IoT networks susceptible to cyber-attacks. The network IDS
is a vital tool for protecting networks against threats and malicious attacks. Existing systems
face significant challenges due to the continuous emergence of new and more sophisticated
cyber threats that are not recognized by them, and therefore advanced IDS is required.
This thesis focusses especially on the DDoS attack that is one of the cyber-attacks that has
affected many IoT networks in recent times and had resulted in substantial devastating losses.
A thorough literature review is conducted on DDoS attacks in the context of IoT networks,
IDSs available especially for the IoT networks and the scope and applicability of DL
methodology for the detection of cyber-attacks. This thesis includes three main contributions
for 1) developing a feature selection algorithm for an IoT network fulfilling six important
objectives, 2) designing four DL models for the detection of DDoS attacks and 3) proposing a
novel IDS for IoT networks. In the proposed work, for developing advanced IDS, a Jumping
Gene adapted NSGA-II multi-objective optimization algorithm for reducing the dimensionality
of massive IoT data and Deep Learning model consisting of a Convolutional Neural Network
(CNN) combined with Long Short-Term Memory (LSTM) for classification are employed. The
experimentation is conducted using a High-Performance Computer (HPC) on the latest
CISIDS2017 datasets for DDoS attacks and achieved an accuracy of 99.03 % with a 5-fold
reduction in training time. The proposed method is compared with machine learning (ML)
algorithms and other state-of-the-art methods, which confirms that the proposed method
outperforms other approaches.Government of Indi
Towards control relevant system design and constrained order controller synthesis:with a thermal control application in immersion lithography
- …
