Search CORE

372 research outputs found

Machine learning for the sustainable energy transition: a data-driven perspective along the value chain from manufacturing to energy conversion

Author: MIELE ERIC STEFAN
Publication venue
Publication date: 25/05/2023
Field of study

According to the special report Global Warming of 1.5 °C of the IPCC, climate action is not only necessary but more than ever urgent. The world is witnessing rising sea levels, heat waves, events of flooding, droughts, and desertification resulting in the loss of lives and damage to livelihoods, especially in countries of the Global South. To mitigate climate change and commit to the Paris agreement, it is of the uttermost importance to reduce greenhouse gas emissions coming from the most emitting sector, namely the energy sector. To this end, large-scale penetration of renewable energy systems into the energy market is crucial for the energy transition toward a sustainable future by replacing fossil fuels and improving access to energy with socio-economic benefits. With the advent of Industry 4.0, Internet of Things technologies have been increasingly applied to the energy sector introducing the concept of smart grid or, more in general, Internet of Energy. These paradigms are steering the energy sector towards more efficient, reliable, flexible, resilient, safe, and sustainable solutions with huge environmental and social potential benefits. To realize these concepts, new information technologies are required, and among the most promising possibilities are Artificial Intelligence and Machine Learning which in many countries have already revolutionized the energy industry. This thesis presents different Machine Learning algorithms and methods for the implementation of new strategies to make renewable energy systems more efficient and reliable. It presents various learning algorithms, highlighting their advantages and limits, and evaluating their application for different tasks in the energy context. In addition, different techniques are presented for the preprocessing and cleaning of time series, nowadays collected by sensor networks mounted on every renewable energy system. With the possibility to install large numbers of sensors that collect vast amounts of time series, it is vital to detect and remove irrelevant, redundant, or noisy features, and alleviate the curse of dimensionality, thus improving the interpretability of predictive models, speeding up their learning process, and enhancing their generalization properties. Therefore, this thesis discussed the importance of dimensionality reduction in sensor networks mounted on renewable energy systems and, to this end, presents two novel unsupervised algorithms. The first approach maps time series in the network domain through visibility graphs and uses a community detection algorithm to identify clusters of similar time series and select representative parameters. This method can group both homogeneous and heterogeneous physical parameters, even when related to different functional areas of a system. The second approach proposes the Combined Predictive Power Score, a method for feature selection with a multivariate formulation that explores multiple sub-sets of expanding variables and identifies the combination of features with the highest predictive power over specified target variables. This method proposes a selection algorithm for the optimal combination of variables that converges to the smallest set of predictors with the highest predictive power. Once the combination of variables is identified, the most relevant parameters in a sensor network can be selected to perform dimensionality reduction. Data-driven methods open the possibility to support strategic decision-making, resulting in a reduction of Operation & Maintenance costs, machine faults, repair stops, and spare parts inventory size. Therefore, this thesis presents two approaches in the context of predictive maintenance to improve the lifetime and efficiency of the equipment, based on anomaly detection algorithms. The first approach proposes an anomaly detection model based on Principal Component Analysis that is robust to false alarms, can isolate anomalous conditions, and can anticipate equipment failures. The second approach has at its core a neural architecture, namely a Graph Convolutional Autoencoder, which models the sensor network as a dynamical functional graph by simultaneously considering the information content of individual sensor measurements (graph node features) and the nonlinear correlations existing between all pairs of sensors (graph edges). The proposed neural architecture can capture hidden anomalies even when the turbine continues to deliver the power requested by the grid and can anticipate equipment failures. Since the model is unsupervised and completely data-driven, this approach can be applied to any wind turbine equipped with a SCADA system. When it comes to renewable energies, the unschedulable uncertainty due to their intermittent nature represents an obstacle to the reliability and stability of energy grids, especially when dealing with large-scale integration. Nevertheless, these challenges can be alleviated if the natural sources or the power output of renewable energy systems can be forecasted accurately, allowing power system operators to plan optimal power management strategies to balance the dispatch between intermittent power generations and the load demand. To this end, this thesis proposes a multi-modal spatio-temporal neural network for multi-horizon wind power forecasting. In particular, the model combines high-resolution Numerical Weather Prediction forecast maps with turbine-level SCADA data and explores how meteorological variables on different spatial scales together with the turbines' internal operating conditions impact wind power forecasts. The world is undergoing a third energy transition with the main goal to tackle global climate change through decarbonization of the energy supply and consumption patterns. This is not only possible thanks to global cooperation and agreements between parties, power generation systems advancements, and Internet of Things and Artificial Intelligence technologies but also necessary to prevent the severe and irreversible consequences of climate change that are threatening life on the planet as we know it. This thesis is intended as a reference for researchers that want to contribute to the sustainable energy transition and are approaching the field of Artificial Intelligence in the context of renewable energy systems

Archivio della ricerca- Università di Roma La Sapienza

Enhancing Computer Network Security through Improved Outlier Detection for Data Streams

Author: Heigl Michael
Publication venue: Západočeská univerzita v Plzni
Publication date: 02/11/2021
Field of study

V několika posledních letech se metody strojového učení (zvláště ty zabývající se detekcí odlehlých hodnot - OD) v oblasti kyberbezpečnosti opíraly o zjišťování anomálií síťového provozu spočívajících v nových schématech útoků. Detekce anomálií v počítačových sítích reálného světa se ale stala stále obtížnější kvůli trvalému nárůstu vysoce objemných, rychlých a dimenzionálních průběžně přicházejících dat (SD), pro která nejsou k dispozici obecně uznané a pravdivé informace o anomalitě. Účinná detekční schémata pro vestavěná síťová zařízení musejí být rychlá a paměťově nenáročná a musejí být schopna se potýkat se změnami konceptu, když se vyskytnou. Cílem této disertace je zlepšit bezpečnost počítačových sítí zesílenou detekcí odlehlých hodnot v datových proudech, obzvláště SD, a dosáhnout kyberodolnosti, která zahrnuje jak detekci a analýzu, tak reakci na bezpečnostní incidenty jako jsou např. nové zlovolné aktivity. Za tímto účelem jsou v práci navrženy čtyři hlavní příspěvky, jež byly publikovány nebo se nacházejí v recenzním řízení časopisů. Zaprvé, mezera ve volbě vlastností (FS) bez učitele pro zlepšování již hotových metod OD v datových tocích byla zaplněna navržením volby vlastností bez učitele pro detekci odlehlých průběžně přicházejících dat označované jako UFSSOD. Následně odvozujeme generický koncept, který ukazuje dva aplikační scénáře UFSSOD ve spojení s online algoritmy OD. Rozsáhlé experimenty ukázaly, že UFSSOD coby algoritmus schopný online zpracování vykazuje srovnatelné výsledky jako konkurenční metoda upravená pro OD. Zadruhé představujeme nový aplikační rámec nazvaný izolovaný les založený na počítání výkonu (PCB-iForest), jenž je obecně schopen využít jakoukoliv online OD metodu založenou na množinách dat tak, aby fungovala na SD. Do tohoto algoritmu integrujeme dvě varianty založené na klasickém izolovaném lese. Rozsáhlé experimenty provedené na 23 multidisciplinárních datových sadách týkajících se bezpečnostní problematiky reálného světa ukázaly, že PCB-iForest jasně překonává už zavedené konkurenční metody v 61 % případů a dokonce dosahuje ještě slibnějších výsledků co do vyváženosti mezi výpočetními náklady na klasifikaci a její úspěšností. Zatřetí zavádíme nový pracovní rámec nazvaný detekce odlehlých hodnot a rozpoznávání schémat útoku proudovým způsobem (SOAAPR), jenž je na rozdíl od současných metod schopen zpracovat výstup z různých online OD metod bez učitele proudovým způsobem, aby získal informace o nových schématech útoku. Ze seshlukované množiny korelovaných poplachů jsou metodou SOAAPR vypočítány tři různé soukromí zachovávající podpisy podobné otiskům prstů, které charakterizují a reprezentují potenciální scénáře útoku s ohledem na jejich komunikační vztahy, projevy ve vlastnostech dat a chování v čase. Evaluace na dvou oblíbených datových sadách odhalila, že SOAAPR může soupeřit s konkurenční offline metodou ve schopnosti korelace poplachů a významně ji překonává z hlediska výpočetního času . Navíc se všechny tři typy podpisů ve většině případů zdají spolehlivě charakterizovat scénáře útoků tím, že podobné seskupují k sobě. Začtvrté představujeme algoritmus nepárového kódu autentizace zpráv (Uncoupled MAC), který propojuje oblasti kryptografického zabezpečení a detekce vniknutí (IDS) pro síťovou bezpečnost. Zabezpečuje síťovou komunikaci (autenticitu a integritu) kryptografickým schématem s podporou druhé vrstvy kódy autentizace zpráv, ale také jako vedlejší efekt poskytuje funkcionalitu IDS tak, že vyvolává poplach na základě porušení hodnot nepárového MACu. Díky novému samoregulačnímu rozšíření algoritmus adaptuje svoje vzorkovací parametry na základě zjištění škodlivých aktivit. Evaluace ve virtuálním prostředí jasně ukazuje, že schopnost detekce se za běhu zvyšuje pro různé scénáře útoku. Ty zahrnují dokonce i situace, kdy se inteligentní útočníci snaží využít slabá místa vzorkování.ObhájenoOver the past couple of years, machine learning methods - especially the Outlier Detection (OD) ones - have become anchored to the cyber security field to detect network-based anomalies rooted in novel attack patterns. Due to the steady increase of high-volume, high-speed and high-dimensional Streaming Data (SD), for which ground truth information is not available, detecting anomalies in real-world computer networks has become a more and more challenging task. Efficient detection schemes applied to networked, embedded devices need to be fast and memory-constrained, and must be capable of dealing with concept drifts when they occur. The aim of this thesis is to enhance computer network security through improved OD for data streams, in particular SD, to achieve cyber resilience, which ranges from the detection, over the analysis of security-relevant incidents, e.g., novel malicious activity, to the reaction to them. Therefore, four major contributions are proposed, which have been published or are submitted journal articles. First, a research gap in unsupervised Feature Selection (FS) for the improvement of off-the-shell OD methods in data streams is filled by proposing Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD. A generic concept is retrieved that shows two application scenarios of UFSSOD in conjunction with online OD algorithms. Extensive experiments have shown that UFSSOD, as an online-capable algorithm, achieves comparable results with a competitor trimmed for OD. Second, a novel unsupervised online OD framework called Performance Counter-Based iForest (PCB-iForest) is being introduced, which generalized, is able to incorporate any ensemble-based online OD method to function on SD. Two variants based on classic iForest are integrated. Extensive experiments, performed on 23 different multi-disciplinary and security-related real-world data sets, revealed that PCB-iForest clearly outperformed state-of-the-art competitors in 61 % of cases and even achieved more promising results in terms of the tradeoff between classification and computational costs. Third, a framework called Streaming Outlier Analysis and Attack Pattern Recognition, denoted as SOAAPR is being introduced that, in contrast to the state-of-the-art, is able to process the output of various online unsupervised OD methods in a streaming fashion to extract information about novel attack patterns. Three different privacy-preserving, fingerprint-like signatures are computed from the clustered set of correlated alerts by SOAAPR, which characterize and represent the potential attack scenarios with respect to their communication relations, their manifestation in the data's features and their temporal behavior. The evaluation on two popular data sets shows that SOAAPR can compete with an offline competitor in terms of alert correlation and outperforms it significantly in terms of processing time. Moreover, in most cases all three types of signatures seem to reliably characterize attack scenarios to the effect that similar ones are grouped together. Fourth, an Uncoupled Message Authentication Code algorithm - Uncoupled MAC - is presented which builds a bridge between cryptographic protection and Intrusion Detection Systems (IDSs) for network security. It secures network communication (authenticity and integrity) through a cryptographic scheme with layer-2 support via uncoupled message authentication codes but, as a side effect, also provides IDS-functionality producing alarms based on the violation of Uncoupled MAC values. Through a novel self-regulation extension, the algorithm adapts its sampling parameters based on the detection of malicious actions on SD. The evaluation in a virtualized environment clearly shows that the detection rate increases over runtime for different attack scenarios. Those even cover scenarios in which intelligent attackers try to exploit the downsides of sampling

DSpace at University of West Bohemia

Estimation of the Free Rotating Hook Load for WOB Estimation

Author: Fernandez Berrocal Miguel
Publication venue: 'Saint Louis University'
Publication date: 01/01/2022
Field of study

Today’s drilling industry emphasizes safety and deeper drilling while reducing drilling costs. Low rate of penetration (ROP) and non-productive time are two main reasons for reduced drilling efficiency. Machine Learning (ML) technology has been increasingly used in the Oil and Gas industry for a variety of problems, including drilling parameters estimation and prediction, drilling inci- dents detection, and optimal well planning. With the abundance of field data available, a number of detailed research studies have been conducted to define the relationship between the ROP and drilling parameters. However, de- veloping a pure data-driven ROP model and optimization remains very challenging. The main reason for this is the number of parameters that affect its estimation, as well as the manner in which different variables are correlated, e.g., the downhole weight on bit (DWOB), rotary speed (RPM), standpipe pressure (SPP), and formation/bit properties [1]. Thus, we suggest breaking down the ROP data-driven problem into its primary parameters and focusing on them separately. Analyzing time series for hook load when drilling with connections is the purpose of our study. In drilling operations, hook load is used to control the weight on the bit. Due to the in- ability to measure WOB directly during drilling, the tension at the top of the drill string is used to determine WOB. As of now, the industry is letting the driller select manually a free-rotating hookload choice according to his judgment and experience. This task of manual inspection and selection is tedious and, according to experts, often left behind due to the number of tasks at a time the driller has to deal with. This recorded value will be used to calculate the WOB indi- rectly while drilling. With an industry that is more autonomous than ever before, this would not be a viable option, and a different approach is proposed in this research. In essence, this research is an attempt to estimate automatically and more accurately rather than relying on the experience and judgments of the driller, the free rotating hook load. The approach will be accomplished by developing a hybrid system combining ML algorithms with statistical analysis and physics principles. As a result of embedding this rig state identification engine idea, we will be able to classify and analyze different real-time data points (e.g., out-of-slips, pick-up, rotating off-bottom, and drilling), obtain the free rotating hook load, and also to utilize it in other applications, such as T&D calibration

UiS Brage

Exploring the frontier of smart video surveillance: novel domains and fine-grain event understanding

Author: Zhang Teng
Publication venue: 'University of Queensland Library'
Publication date: 22/05/2017
Field of study

University of Queensland eSpace

Managing Networked IoT Assets Using Practical and Scalable Traffic Inference

Author: Pashamokhtari Arman
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

The Internet has recently witnessed unprecedented growth of a class of connected assets called the Internet of Things (IoT). Due to relatively immature manufacturing processes and limited computing resources, IoTs have inadequate device-level security measures, exposing the Internet to various cyber risks. Therefore, network-level security has been considered a practical and scalable approach for securing IoTs, but this cannot be employed without discovering the connected devices and characterizing their behavior. Prior research leveraged predictable patterns in IoT network traffic to develop inference models. However, they fall short of expectations in addressing practical challenges, preventing them from being deployed in production settings. This thesis identifies four practical challenges and develops techniques to address them which can help secure businesses and protect user privacy against growing cyber threats. My first contribution balances prediction gains against computing costs of traffic features for IoT traffic classification and monitoring. I develop a method to find the best set of specialized models for multi-view classification that can reach an average accuracy of 99%, i.e., a similar accuracy compared to existing works but reducing the cost by a factor of 6. I develop a hierarchy of one-class models per asset class, each at certain granularity, to progressively monitor IoT traffic. My second contribution addresses the challenges of measurement costs and data quality. I develop an inference method that uses stochastic and deterministic modeling to predict IoT devices in home networks from opaque and coarse-grained IPFIX flow data. Evaluations show that false positive rates can be reduced by 75% compared to related work without significantly affecting true positives. My third contribution focuses on the challenge of concept drifts by analyzing over six million flow records collected from 12 real home networks. I develop several inference strategies and compare their performance under concept drift, particularly when labeled data is unavailable in the testing phase. Finally, my fourth contribution studies the resilience of machine learning models against adversarial attacks with a specific focus on decision tree-based models. I develop methods to quantify the vulnerability of a given decision tree-based model against data-driven adversarial attacks and refine vulnerable decision trees, making them robust against 92% of adversarial attacks

UNSWorks

Time series novelty detection with application to production sensor systems

Author: Anstey Jonathan S. (Jonathan Skanes)
Publication venue: Memorial University of Newfoundland
Publication date: 01/01/2011
Field of study

Modern fiber manufacturing plants rely heavily on the use of automation. Automated facilities use sensors to measure fiber state and react to data patterns, which correspond to physical events. Many patterns can be predefined either by careful analysis or by domain experts. Instances of these patterns can then be discovered through techniques such as pattern recognition. However, pattern recognition will fail to detect events that have not been predefined, potentially causing expensive production errors. A solution to this dilemma, novelty detection, allows for the identification of interesting data patterns embedded in otherwise normal data. In this thesis we investigate some of the aspects of implementing novelty detection in a fiber manufacturing system. Specifically, we empirically evaluate the effectiveness of currently available feature extraction and novelty detection techniques on data from a real fiber manufacturing system. -- Our results show that piecewise linear approximation (PLA) methods produce the highest quality features for fiber property datasets. Motivated by this fact, we introduced a new PLA algorithm called improved bottom up segmentation (IBUS). This new algorithm produced the highest quality features and considerably more data reduction than all currently available feature extraction techniques for our application. -- Further empirical results from several leading time series novelty detection techniques revealed two conclusions. A simple Euclidean distance based technique is the best overall when no feature extraction is used. However, when feature extraction is used the Tarzan technique performs best

Memorial University Research Repository

Progress in Landslide Research and Technology, Volume 1 Issue 1, 2022

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book provides an overview of the progress in landslide research and technology and is part of a book series of the International Consortium on Landslides (ICL). The book provides a common platform for the publication of recent progress in landslide research and technology for practical applications and the benefit for the society contributing to the Kyoto Landslide Commitment 2020, which is expected to continue up to 2030 and even beyond to globally promote the understanding and reduction of landslide disaster risk, as well as to address the 2030 Agenda Sustainable Development Goals

OAPEN Library

Sustainable Agriculture and Advances of Remote Sensing (Volume 2)

Author
Publication venue: 'MDPI AG'
Publication date: 25/10/2022
Field of study

Agriculture, as the main source of alimentation and the most important economic activity globally, is being affected by the impacts of climate change. To maintain and increase our global food system production, to reduce biodiversity loss and preserve our natural ecosystem, new practices and technologies are required. This book focuses on the latest advances in remote sensing technology and agricultural engineering leading to the sustainable agriculture practices. Earth observation data, in situ and proxy-remote sensing data are the main source of information for monitoring and analyzing agriculture activities. Particular attention is given to earth observation satellites and the Internet of Things for data collection, to multispectral and hyperspectral data analysis using machine learning and deep learning, to WebGIS and the Internet of Things for sharing and publication of the results, among others

Directory of Open Access Books (DOAB)

A characterization of landslide occurrence in the Kigezi Highlands of South Western Uganda

Author: Nseka Denis
Publication venue: 'University of Zagreb, Faculty of Science, Department of Mathematics'
Publication date: 01/01/2018
Field of study

The frequency and magnitude of landslide occurrence in the Kigezi highlands of South Western Uganda has increased, but the key underpinnings of the occurrences are yet to be understood. The overall aim of this study was to characterize the parameters underpinning landslide occurrence in the Kigezi highlands. This information is important for predicting or identifying actual and potential landslide sites. This should inform policy, particularly in terms of developing early warning systems to landslide hazards in these highlands. The present study analysed the area’s topography, soil properties as well as land use and cover changes underpinning the spatialtemporal distribution of landslide occurrence in the region. The present study focussed on selected topographic parameters including slope gradient, profile curvature, Topographic Wetness Index (TWI), Stream Power Index (SPI), and Topographic Position Index (TPI). These factors were parameterized in the field and GIS environment using a 10 m Digital Elevation Model. Sixty five landslide features were surveyed and mapped. Soil properties were characterised in relation to slope position. Onsite soil property analysis was conducted within the landslide scars, auger holes and full profile representative sites. Furthermore, soil infiltration and strength tests, as well as clay mineralogy analyses were also conducted. An analysis of the spatial-temporal land use and cover changes was undertaken using satellite imagery spanning the period between 1985 and 2015. Landslides were noted to concentrate along topographic hollows in the landscape. The occurrence is dominant where slope gradient is between 25˚ and 35˚, profile curvature between 0.1 and 5, TWI between 8 and 18, SPI >10 and TPI between -1 and 1. Landslides are less pronounced on slope zones where slope gradient is 45˚, profile curvature 18, SPI 1. Deep soil profiles ranging between 2.5 and 7 meters are a major characteristic of the study area. Soils are characterized by clay pans at a depth ranging between 0.75 and 3 meters within the profiles. The study area is dominated by clay texture, except for the uppermost surface horizons, which are loamy sand. All surface horizons analysed had the percentage of sand, silt and clay ranging from 33 to 55%, 22 to 40% and 10 to 30% respectively. In the deeper horizons, sand was observed to reduce drastically to less than 23%, while clay increased to greater than 50%. The clay content is very high in the deeper horizons exceeding 35%. By implication, such soils with a very high clay content and plasticity index are considered as Vertisols, with a profound influence in the occurrence of landslides. The top soil predominantly contains more quartz, while subsurface horizons have considerable amounts of illite/muscovite as the dominant clay minerals, ranging from 43% to 47 %. The liquid limit, plasticity index, computed weighted plasticity index (PIw), expansiveness (ɛex) and dispersion ranging from 50, 22, 17, 10 and 23 to 66, 44,34,54 and 64, respectively also have strong implications for landslide occurrence. Landslides are not normally experienced during or immediately after extreme rainfall events but occur later in the rainfall season. By implication, this time lag in landslide occurrence and rainfall distribution, is due to the initial infiltration through quartz dominated upper soil layers, before illite/muscovite clays in the lower soil horizons get saturated. Whereas forest cover reduced from 40 % in 1985 to 8% in 2015, cultivated land and settlements increased from 16% and 11% to 52% and 25% respectively during the same period. The distribution of cultivated land decreased in lower slope sections within gradient group < 15˚ by 59%. It however increased in upper sections within gradient cluster 25˚ to 35˚ by over 85% during the study period. There is a shift of cultivated land to the steeper sensitive upper slope elements associated with landslides in the study area. More than 50% of the landslides are occurring on cultivated land, 20% on settlements while less than 15 % and 10% are occurring on grassland and forests with degraded areas respectively. Landslides in Kigezi highlands are triggered by a complex interaction of multiple- factors, including dynamic triggers and ground condition variables. Topographic hollows are convergence zones within the landscape where all the parameters interact to cause landslides. Topographic hollows are therefore potential and actual landslide sites in the study area. Characterized by deep soil horizons with high clay content dominated by illite/muscovite minerals in the sub soils and profile concave forms with moderately steep slopes, topographic hollows are the most vulnerable slope elements to landslide occurrence. The spatial temporal patterns of landslide occurrence in the study area has changed due to increased cultivation of steep middle and upper slopes. Characterized by deep soil horizons with high clay content dominated by illite/muscovite minerals in the sub soils and profile concave forms with moderately steep slopes, topographic hollows are the most vulnerable slope elements to landslide occurrence. The spatial-temporal patterns of landslide occurrence in the study area has changed due to increased cultivation of steep middle and upper slopes. A close spatial and temporal correlation between land use/cover changes and landslide occurrence is discernible. The understanding of these topographical, pedological and land use/cover parameters and their influence on landslide occurrence is important in land management. It is now possible to identify and predict actual and potential landslide zones, and also demarcate safer zones for community activities. The information generated about the area’s topographic, pedological and land cover characteristics should help in vulnerability mitigation and enhance community resilience to landslide hazards in this fragile highland ecosystem. This can be done through designating zones for community activities while avoiding potential landslide zones. It is also recommended that, tree cover restoration be done in the highlands and the farmers encouraged to re-establish terrace farming while avoiding cultivation of sensitive steep middle and upper slope sections

Nelson Mandela University

South East Academic Libraries System (SEALS)

Challenges and Open Questions of Machine Learning in Computer Security

Author: Pevný Tomáš
Publication venue
Publication date: 01/01/2018
Field of study

This habilitation thesis presents advancements in machine learning for computer security, arising from problems in network intrusion detection and steganography. The thesis put an emphasis on explanation of traits shared by steganalysis, network intrusion detection, and other security domains, which makes these domains different from computer vision, speech recognition, and other fields where machine learning is typically studied. Then, the thesis presents methods developed to at least partially solve the identified problems with an overall goal to make machine learning based intrusion detection system viable. Most of them are general in the sense that they can be used outside intrusion detection and steganalysis on problems with similar constraints. A common feature of all methods is that they are generally simple, yet surprisingly effective. According to large-scale experiments they almost always improve the prior art, which is likely caused by being tailored to security problems and designed for large volumes of data. Specifically, the thesis addresses following problems: anomaly detection with low computational and memory complexity such that efficient processing of large data is possible; multiple-instance anomaly detection improving signal-to-noise ration by classifying larger group of samples; supervised classification of tree-structured data simplifying their encoding in neural networks; clustering of structured data; supervised training with the emphasis on the precision in top p% of returned data; and finally explanation of anomalies to help humans understand the nature of anomaly and speed-up their decision. Many algorithms and method presented in this thesis are deployed in the real intrusion detection system protecting millions of computers around the globe

Digital Library of the Czech Technical University in Prague