Search CORE

6 research outputs found

Artificial intelligence driven anomaly detection for big data systems

Author: Alnafessah Ahmad
Publication venue: Computing, Imperial College London
Publication date: 01/06/2022
Field of study

The main goal of this thesis is to contribute to the research on automated performance anomaly detection and interference prediction by implementing Artificial Intelligence (AI) solutions for complex distributed systems, especially for Big Data platforms within cloud computing environments. The late detection and manual resolutions of performance anomalies and system interference in Big Data systems may lead to performance violations and financial penalties. Motivated by this issue, we propose AI-based methodologies for anomaly detection and interference prediction tailored to Big Data and containerized batch platforms to better analyze system performance and effectively utilize computing resources within cloud environments. Therefore, new precise and efficient performance management methods are the key to handling performance anomalies and interference impacts to improve the efficiency of data center resources. The first part of this thesis contributes to performance anomaly detection for in-memory Big Data platforms. We examine the performance of Big Data platforms and justify our choice of selecting the in-memory Apache Spark platform. An artificial neural network-driven methodology is proposed to detect and classify performance anomalies for batch workloads based on the RDD characteristics and operating system monitoring metrics. Our method is evaluated against other popular machine learning algorithms (ML), as well as against four different monitoring datasets. The results prove that our proposed method outperforms other ML methods, typically achieving 98–99% F-scores. Moreover, we prove that a random start instant, a random duration, and overlapped anomalies do not significantly impact the performance of our proposed methodology. The second contribution addresses the challenge of anomaly identification within an in-memory streaming Big Data platform by investigating agile hybrid learning techniques. We develop TRACK (neural neTwoRk Anomaly deteCtion in sparK) and TRACK-Plus, two methods to efficiently train a class of machine learning models for performance anomaly detection using a fixed number of experiments. Our model revolves around using artificial neural networks with Bayesian Optimization (BO) to find the optimal training dataset size and configuration parameters to efficiently train the anomaly detection model to achieve high accuracy. The objective is to accelerate the search process for finding the size of the training dataset, optimizing neural network configurations, and improving the performance of anomaly classification. A validation based on several datasets from a real Apache Spark Streaming system is performed, demonstrating that the proposed methodology can efficiently identify performance anomalies, near-optimal configuration parameters, and a near-optimal training dataset size while reducing the number of experiments up to 75% compared with naïve anomaly detection training. The last contribution overcomes the challenges of predicting completion time of containerized batch jobs and proactively avoiding performance interference by introducing an automated prediction solution to estimate interference among colocated batch jobs within the same computing environment. An AI-driven model is implemented to predict the interference among batch jobs before it occurs within system. Our interference detection model can alleviate and estimate the task slowdown affected by the interference. This model assists the system operators in making an accurate decision to optimize job placement. Our model is agnostic to the business logic internal to each job. Instead, it is learned from system performance data by applying artificial neural networks to establish the completion time prediction of batch jobs within the cloud environments. We compare our model with three other baseline models (queueing-theoretic model, operational analysis, and an empirical method) on historical measurements of job completion time and CPU run-queue size (i.e., the number of active threads in the system). The proposed model captures multithreading, operating system scheduling, sleeping time, and job priorities. A validation based on 4500 experiments based on the DaCapo benchmarking suite was carried out, confirming the predictive efficiency and capabilities of the proposed model by achieving up to 10% MAPE compared with the other models.Open Acces

Spiral - Imperial College Digital Repository

On the Existence of Characterization Logics and Fundamental Properties of Argumentation Semantics

Author: Baumann Ringo
Publication venue
Publication date: 18/12/2019
Field of study

Given the large variety of existing logical formalisms it is of utmost importance to select the most adequate one for a specific purpose, e.g. for representing the knowledge relevant for a particular application or for using the formalism as a modeling tool for problem solving. Awareness of the nature of a logical formalism, in other words, of its fundamental intrinsic properties, is indispensable and provides the basis of an informed choice. One such intrinsic property of logic-based knowledge representation languages is the context-dependency of pieces of knowledge. In classical propositional logic, for example, there is no such context-dependence: whenever two sets of formulas are equivalent in the sense of having the same models (ordinary equivalence), then they are mutually replaceable in arbitrary contexts (strong equivalence). However, a large number of commonly used formalisms are not like classical logic which leads to a series of interesting developments. It turned out that sometimes, to characterize strong equivalence in formalism L, we can use ordinary equivalence in formalism L0: for example, strong equivalence in normal logic programs under stable models can be characterized by the standard semantics of the logic of here-and-there. Such results about the existence of characterizing logics has rightly been recognized as important for the study of concrete knowledge representation formalisms and raise a fundamental question: Does every formalism have one? In this thesis, we answer this question with a qualified “yes”. More precisely, we show that the important case of considering only finite knowledge bases guarantees the existence of a canonical characterizing formalism. Furthermore, we argue that those characterizing formalisms can be seen as classical, monotonic logics which are uniquely determined (up to isomorphism) regarding their model theory. The other main part of this thesis is devoted to argumentation semantics which play the flagship role in Dung’s abstract argumentation theory. Almost all of them are motivated by an easily understandable intuition of what should be acceptable in the light of conflicts. However, although these intuitions equip us with short and comprehensible formal definitions it turned out that their intrinsic properties such as existence and uniqueness, expressibility, replaceability and verifiability are not that easily accessible. We review the mentioned properties for almost all semantics available in the literature. In doing so we include two main axes: namely first, the distinction between extension-based and labelling-based versions and secondly, the distinction of different kind of argumentation frameworks such as finite or unrestricted ones

arXiv.org e-Print Archive

Qucosa

Quality of distance e-learning at Saudi universities : students' perceptions

Author: Alhathlol Ali
Publication venue: Newcastle University
Publication date: 01/01/2017
Field of study

Ph. D. ThesisOne key tool for promoting social justice in the Kingdom of Saudi Arabia (SA) is to ensure the growth and improvement of Distance e-Learning (DeL). This research study investigates DeL from the perspective of one key group of stakeholders, the students who are currently enrolled in DeL. Their views are presented on the importance and application of a set of standards regarding quality, while exploration of the study setting and context highlights the specificity of the education system in SA. A conception of quality in DeL is then explicated through a reading of the history of Distance Education (DE), the usage of quality in education today and the most significant current models of pedagogy and culture. This research hence provides the basis for a pragmatic methodology to analyse the perceptions of students regarding selected standards of quality. A total of 591 students were surveyed in a mixed methods approach comprised of a questionnaire and a focus group. The data gathered from surveying perceptions of students is also used to construct a picture of the strengths and weaknesses of DeL in SA, as well as the barriers and enhancements to learning resulting from its introduction. Here, culture is found to be a major influence on the perceptions of the students, while DeL exists within a wider, behaviourist educational tradition. If they are to be effective, the introduction of Western DeL practices should therefore serve to negotiate the gap between the need for globalised skills and the local culture and traditions. This thesis identifies manifold issues arising from the student’s experiences that contribute to the obstruction of their expectations about quality; notably, a lack of staff training, large class sizes and a failure to employ technology (including Web2.0) adequately. Many of the problems raised in this study reflect the rapid pace and unplanned nature of DeL’s introduction in SA. The recommendations subsequently made about strategic and institutional improvement suggest that quality is created through both progressive and planned chang

Newcastle University eTheses

Minority target class detection for short text classification

Author: Chiroma Fatima
Publication venue
Publication date: 01/01/2021
Field of study