Search CORE

326 research outputs found

A Machine Learning Enhanced Scheme for Intelligent Network Management

Author: Zuo Y
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 25/11/2019
Field of study

The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

Open Research Exeter

AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges

Author: Cheng Qian
Hoi Steven C. H.
Liu Chenghao
Saha Amrita
Sahoo Doyen
Saverese Silvio
Singh Manpreet
Woo Gerald
Yang Wenzhuo
Publication venue
Publication date: 10/04/2023
Field of study

Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big data generated by IT Operations processes, particularly in cloud infrastructures, to provide actionable insights with the primary goal of maximizing availability. There are a wide variety of problems to address, and multiple use-cases, where AI capabilities can be leveraged to enhance operational efficiency. Here we provide a review of the AIOps vision, trends challenges and opportunities, specifically focusing on the underlying AI techniques. We discuss in depth the key types of data emitted by IT Operations activities, the scale and challenges in analyzing them, and where they can be helpful. We categorize the key AIOps tasks as - incident detection, failure prediction, root cause analysis and automated actions. We discuss the problem formulation for each task, and then present a taxonomy of techniques to solve these problems. We also identify relatively under explored topics, especially those that could significantly benefit from advances in AI literature. We also provide insights into the trends in this field, and what are the key investment opportunities

arXiv.org e-Print Archive

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

Directory of Open Access Books (DOAB)

Failure avoidance techniques for HPC systems based on failure prediction

Author: Gainaru Ana
Publication venue
Publication date
Field of study

A increasingly larger percentage of computing capacity in today's large high-performance computing systems is wasted due to failures and recoveries. Moreover, it is expected that high performance computing will reach exascale within a decade, decreasing the mean time between failures to one day or even a few hours, making fault tolerance a major challenge for the HPC community. As a consequence, current research is focusing on providing fault tolerance strategies that aim to minimize fault's effects on applications. By far, the most popular and used techniques from this field are rollback-recovery protocols. However, existing rollback-recovery techniques have severe scalability limitations and without further optimizations the use of current protocols is put under serious questions for future exascale systems. A way of reducing the overhead induced by these strategies is by combining them with failure avoidance methods. Failure avoidance is based on a prediction model that detects fault occurrences ahead of time and allows preventive measures to be taken, such as task migration or checkpointing the application before failure. The same methodology can be generalized and applied to anomaly avoidance, where anomaly can mean anything from system failures to performance degradation at the application level. For this, monitoring systems require a reliable prediction system to give information on when failures will occur and at what location. Thus far, research in this field used ideal predictors that do not have any implementation in real HPC systems. This thesis focuses on analyzing and characterizing anomaly patterns at both the application and system levels and on offering solutions to prevent anomalies from affecting applications running in the system. Currently, there is no good characterization of normal behavior for system state data or how different components react to failures within HPC systems. For example, in case a node experiences a network failure and is incapable of generating log messages, the failure is announced in the log files by a lack of generated messages. Conversely, some component failures may cause logging a large numbers of notifications. For example, memory failures can result in a single faulty component generating hundreds or thousands of messages in less than a day. It is important to be able to capture the behavior of each event type and understand what is the normal behavior and how each failure type affects it. This idea represents the building block of a novel way of characterizing the state of the system in time by analyzing the properties of each event described in different system metrics, considering its own trend and behavior. The method introduces the integration between signal processing concepts and data mining techniques in the context of analysis for large-scale systems. By shaping the normal and faulty behavior of each event and of the whole system, appropriate models and methods for descriptive and forecasting purposes are proposed. After having an accurate overview of the whole system, the thesis analyzes how the prediction model impacts current fault tolerance techniques and in the end integrates it into a fault avoidance solution. This hybrid protocol optimizes the overhead that current fault tolerance strategies impose on applications and presents a viable solution for future large-scale systems

Illinois Digital Environment for Access to Learning and Scholarship Repository

Intelligent zero-day intrusion detection framework for internet of things

Author: Khraisat Ansam
Publication venue: 'Federation University Australia'
Publication date: 01/01/2020
Field of study

Zero-day intrusion detection system faces serious challenges as hundreds of thousands of new instances of malware are being created every day to cause harm or damage to the computer system. Cyber-attacks are becoming more sophisticated, leading to challenges in intrusion detection. There are many Intrusion Detection Systems (IDSs), which are proposed to identify abnormal activities, but most of these IDSs produce a large number of false positives and low detection accuracy. Hence, a significant quantity of false positives could generate a high-level of alerts in a short period of time as the normal activities are classified as intrusion activities. This thesis proposes a novel framework of hybrid intrusion detection system that integrates the Signature Intrusion Detection System (SIDS) with the Anomaly Intrusion Detection System (AIDS) to detect zero-day attacks with high accuracy. SIDS has been used to identify previously known intrusions, and AIDS has been applied to detect unknown zero-day intrusions. The goal of this research is to combine the strengths of each technique toward the development of a hybrid framework for the efficient intrusion detection system. A number of performance measures including accuracy, F-measure and area under ROC curve have been used to evaluate the efficacy of our proposed models and to compare and contrast with existing approaches. Extensive simulation results conducted in this thesis show that the proposed framework is capable of yielding excellent detection performance when tested with a number of widely used benchmark datasets in the intrusion detection system domain. Experiments show that the proposed hybrid IDS provides higher detection rate and lower false-positive rate in detecting intrusions as compared to the SIDS and AIDS techniques individually.Doctor of Philosoph

Federation ResearchOnline

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Detecting anomalies in modern IT systems through the inference of structure and the detection of novelties in system logs

Author: Coustié Oihana
Publication venue
Publication date: 11/05/2021
Field of study

Les anomalies dans les logs des systèmes d’information sont souvent le signe de failles ou de vulnérabilités. Leur détection automatique est difficile à cause du manque de structure dans les logs, et de la complexité des anomalies. Les méthodes d’inférence de structure existantes sont peu flexibles : elles ne sont pas paramétriques, ou reposent sur des hypothèses syntaxiques fortes, qui s’avèrent parfois inadéquates. Les méthodes de détection d’anomalies adoptent quant à elles une représentation des données qui néglige le temps écoulé entre les logs, et sont donc inadaptées à la détection d’anomalies temporelles. La contribution de cette thèse est double. Nous proposons d’abord METING, une méthode d’inférence de structure paramétrique et modulable. METING ne repose sur aucune hypothèse syntaxique forte, mais se base sur l’exploration de motifs fréquents, en étudiant les n-grammes des logs. Nous montrons expérimentalement que METING surpasse les méthodes existantes, avec d’importantes améliorations sur certains jeux de données. Nous montrons également que la sensibilité de notre méthode à ses hyper-paramètres lui permet de s’adapter à l’hétérogénéité des jeux de données. Enfin, nous proposons une extension de METING au contexte de la racinisation en traitement automatique du texte, et montrons que notre approche fournit une racinisation multilingue, sans règle, et plus efficace que la méthode de Porter, référence de l’état de l’art. Nous présentons également NoTIL, une méthode de détection de nouveautés par apprentissage profond. NoTIL utilise une représentation des données capable de détecter les irrégularités temporelles dans les logs. Notre méthode repose sur l’apprentissage d’une tâche de prédiction intermédiaire pour modéliser le comportement nominal des logs. Nous comparons notre méthode à celles de l’état de l’art et concluons que NoTIL est la méthode capable de traiter la plus grande variété d’anomalies, grâce aux choix de sa représentation des données.The anomalies in the logs of information system are often the sign of faults and vulnerabilities. Their detection is challenging due to the lack of structure in logs and the complexity of the anomalies. Existing methods to infer the structure are poorly flexible: they are not parametric, or rely on strong syntactic assumptions, which sometimes prove to be inadequate. Anomaly detection methods adopt a data representation that neglects the time elapsed between the logs, and are therefore unsuitable for the detection of temporal anomalies. The contribution of this thesis is twofold. We first propose METING, a parametric and modular structure inference method. METING does not rely on any strong syntactic assumption, but is based on the mining of frequent patterns, through the study of n-grams. We experimentally show that METING surpasses the existing methods, with important improvements on some datasets. We also show the important sensitivity of our method to its hyper-parameters, which allows the exploration of many configurations, and the adaptation to the heterogeneity of datasets. Finally, we propose an extension of METING to the context of stemming in text mining, and show that our approach provides a stemming solution that is multilingual, rule-free, and more efficient than that of Porter, the state-of-the-art reference. We also present NoTIL, a novelty detection method based on deep learning. NoTIL uses a data representation capable of detecting temporal irregularities in the logs. Our method is based on the learning of an intermediate prediction task to model the nominal behavior of logs. We compare our method to the most up-to-date references and conclude that NoTIL is the method capable of dealing with the greatest variety of anomalies, thanks to its data representation

Toulouse Capitole Publications

Detecting anomalies in modern IT systems through the inference of structure and the detection of novelties in system logs

Author: Coustié Oihana
Publication venue
Publication date: 11/05/2021
Field of study

Toulouse Capitole Publications

Toulouse 1 Capitole Publications