Search CORE

7,765 research outputs found

Measurement-based reliability prediction methodology

Author: Linn Linda Shen
Publication venue
Publication date
Field of study

In the past, analytical and measurement based models were developed to characterize computer system behavior. An open issue is how these models can be used, if at all, for system design improvement. The issue is addressed here. A combined statistical/analytical approach to use measurements from one environment to model the system failure behavior in a new environment is proposed. A comparison of the predicted results with the actual data from the new environment shows a close correspondence

NASA Technical Reports Server

A Survey of Prediction and Classification Techniques in Multicore Processor Systems

Author: Ababei Cristinel
Moghaddam Milad Ghorbani
Publication venue: e-Publications@Marquette
Publication date: 01/05/2019
Field of study

In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

epublications@Marquette

Model-driven Scheduling for Distributed Stream Processing Systems

Author: Shukla Anshu
Simmhan Yogesh
Publication venue: 'Elsevier BV'
Publication date: 06/02/2017
Field of study

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Artificial intelligence driven anomaly detection for big data systems

Author: Alnafessah Ahmad
Publication venue: Computing, Imperial College London
Publication date: 01/06/2022
Field of study

The main goal of this thesis is to contribute to the research on automated performance anomaly detection and interference prediction by implementing Artificial Intelligence (AI) solutions for complex distributed systems, especially for Big Data platforms within cloud computing environments. The late detection and manual resolutions of performance anomalies and system interference in Big Data systems may lead to performance violations and financial penalties. Motivated by this issue, we propose AI-based methodologies for anomaly detection and interference prediction tailored to Big Data and containerized batch platforms to better analyze system performance and effectively utilize computing resources within cloud environments. Therefore, new precise and efficient performance management methods are the key to handling performance anomalies and interference impacts to improve the efficiency of data center resources. The first part of this thesis contributes to performance anomaly detection for in-memory Big Data platforms. We examine the performance of Big Data platforms and justify our choice of selecting the in-memory Apache Spark platform. An artificial neural network-driven methodology is proposed to detect and classify performance anomalies for batch workloads based on the RDD characteristics and operating system monitoring metrics. Our method is evaluated against other popular machine learning algorithms (ML), as well as against four different monitoring datasets. The results prove that our proposed method outperforms other ML methods, typically achieving 98–99% F-scores. Moreover, we prove that a random start instant, a random duration, and overlapped anomalies do not significantly impact the performance of our proposed methodology. The second contribution addresses the challenge of anomaly identification within an in-memory streaming Big Data platform by investigating agile hybrid learning techniques. We develop TRACK (neural neTwoRk Anomaly deteCtion in sparK) and TRACK-Plus, two methods to efficiently train a class of machine learning models for performance anomaly detection using a fixed number of experiments. Our model revolves around using artificial neural networks with Bayesian Optimization (BO) to find the optimal training dataset size and configuration parameters to efficiently train the anomaly detection model to achieve high accuracy. The objective is to accelerate the search process for finding the size of the training dataset, optimizing neural network configurations, and improving the performance of anomaly classification. A validation based on several datasets from a real Apache Spark Streaming system is performed, demonstrating that the proposed methodology can efficiently identify performance anomalies, near-optimal configuration parameters, and a near-optimal training dataset size while reducing the number of experiments up to 75% compared with naïve anomaly detection training. The last contribution overcomes the challenges of predicting completion time of containerized batch jobs and proactively avoiding performance interference by introducing an automated prediction solution to estimate interference among colocated batch jobs within the same computing environment. An AI-driven model is implemented to predict the interference among batch jobs before it occurs within system. Our interference detection model can alleviate and estimate the task slowdown affected by the interference. This model assists the system operators in making an accurate decision to optimize job placement. Our model is agnostic to the business logic internal to each job. Instead, it is learned from system performance data by applying artificial neural networks to establish the completion time prediction of batch jobs within the cloud environments. We compare our model with three other baseline models (queueing-theoretic model, operational analysis, and an empirical method) on historical measurements of job completion time and CPU run-queue size (i.e., the number of active threads in the system). The proposed model captures multithreading, operating system scheduling, sleeping time, and job priorities. A validation based on 4500 experiments based on the DaCapo benchmarking suite was carried out, confirming the predictive efficiency and capabilities of the proposed model by achieving up to 10% MAPE compared with the other models.Open Acces

Spiral - Imperial College Digital Repository

Adaptive learning-based resource management strategy in fog-to-cloud

Author: Sengupta Souvik
Publication venue: Universitat Politècnica de Catalunya
Publication date: 20/10/2020
Field of study

Technology in the twenty-first century is rapidly developing and driving us into a new smart computing world, and emerging lots of new computing architectures. Fog-to-Cloud (F2C) is among one of them, which emerges to ensure the commitment for bringing the higher computing facilities near to the edge of the network and also help the large-scale computing system to be more intelligent. As the F2C is in its infantile state, therefore one of the biggest challenges for this computing paradigm is to efficiently manage the computing resources. Mainly, to address this challenge, in this work, we have given our sole interest for designing the initial architectural framework to build a proper, adaptive and efficient resource management mechanism in F2C. F2C has been proposed as a combined, coordinated and hierarchical computing platform, where a vast number of heterogeneous computing devices are participating. Notably, their versatility creates a massive challenge for effectively handling them. Even following any large-scale smart computing system, it can easily recognize that various kind of services is served for different purposes. Significantly, every service corresponds with the various tasks, which have different resource requirements. So, knowing the characteristics of participating devices and system offered services is giving advantages to build effective and resource management mechanism in F2C-enabled system. Considering these facts, initially, we have given our intense focus for identifying and defining the taxonomic model for all the participating devices and system involved services-tasks. In any F2C-enabled system consists of a large number of small Internet-of-Things (IoTs) and generating a continuous and colossal amount of sensing-data by capturing various environmental events. Notably, this sensing-data is one of the key ingredients for various smart services which have been offered by the F2C-enabled system. Besides that, resource statistical information is also playing a crucial role, for efficiently providing the services among the system consumers. Continuous monitoring of participating devices generates a massive amount of resource statistical information in the F2C-enabled system. Notably, having this information, it becomes much easier to know the device's availability and suitability for executing some tasks to offer some services. Therefore, ensuring better service facilities for any latency-sensitive services, it is essential to securely distribute the sensing-data and resource statistical information over the network. Considering these matters, we also proposed and designed a secure and distributed database framework for effectively and securely distribute the data over the network. To build an advanced and smarter system is necessarily required an effective mechanism for the utilization of system resources. Typically, the utilization and resource handling process mainly depend on the resource selection and allocation mechanism. The prediction of resources (e.g., RAM, CPU, Disk, etc.) usage and performance (i.e., in terms of task execution time) helps the selection and allocation process. Thus, adopting the machine learning (ML) techniques is much more useful for designing an advanced and sophisticated resource allocation mechanism in the F2C-enabled system. Adopting and performing the ML techniques in F2C-enabled system is a challenging task. Especially, the overall diversification and many other issues pose a massive challenge for successfully performing the ML techniques in any F2C-enabled system. Therefore, we have proposed and designed two different possible architectural schemas for performing the ML techniques in the F2C-enabled system to achieve an adaptive, advance and sophisticated resource management mechanism in the F2C-enabled system. Our proposals are the initial footmarks for designing the overall architectural framework for resource management mechanism in F2C-enabled system.La tecnologia del segle XXI avança ràpidament i ens condueix cap a un nou món intel·ligent, creant nous models d'arquitectures informàtiques. Fog-to-Cloud (F2C) és un d’ells, i sorgeix per garantir el compromís d’acostar les instal·lacions informàtiques a prop de la xarxa i també ajudar el sistema informàtic a gran escala a ser més intel·ligent. Com que el F2C es troba en un estat preliminar, un dels majors reptes d’aquest paradigma tecnològic és gestionar eficientment els recursos informàtics. Per fer front a aquest repte, en aquest treball hem centrat el nostre interès en dissenyar un marc arquitectònic per construir un mecanisme de gestió de recursos adequat, adaptatiu i eficient a F2C.F2C ha estat concebut com una plataforma informàtica combinada, coordinada i jeràrquica, on participen un gran nombre de dispositius heterogenis. La seva versatilitat planteja un gran repte per gestionar-los de manera eficaç. Els serveis que s'hi executen consten de diverses tasques, que tenen requisits de recursos diferents. Per tant, conèixer les característiques dels dispositius participants i dels serveis que ofereix el sistema és un requisit per dissenyar mecanismes eficaços i de gestió de recursos en un sistema habilitat per F2C. Tenint en compte aquests fets, inicialment ens hem centrat en identificar i definir el model taxonòmic per a tots els dispositius i sistemes implicats en l'execució de tasques de serveis. Qualsevol sistema habilitat per F2C inclou en un gran nombre de dispositius petits i connectats (conegut com a Internet of Things, o IoT) que generen una quantitat contínua i colossal de dades de detecció capturant diversos events ambientals. Aquestes dades són un dels ingredients clau per a diversos serveis intel·ligents que ofereix F2C. A més, el seguiment continu dels dispositius participants genera igualment una gran quantitat d'informació estadística. En particular, en tenir aquesta informació, es fa molt més fàcil conèixer la disponibilitat i la idoneïtat dels dispositius per executar algunes tasques i oferir alguns serveis. Per tant, per garantir millors serveis sensibles a la latència, és essencial distribuir de manera equilibrada i segura la informació estadística per la xarxa. Tenint en compte aquests assumptes, també hem proposat i dissenyat un entorn de base de dades segura i distribuïda per gestionar de manera eficaç i segura les dades a la xarxa. Per construir un sistema avançat i intel·ligent es necessita un mecanisme eficaç per a la gestió de l'ús dels recursos del sistema. Normalment, el procés d’utilització i manipulació de recursos depèn principalment del mecanisme de selecció i assignació de recursos. La predicció de l’ús i el rendiment de recursos (per exemple, RAM, CPU, disc, etc.) en termes de temps d’execució de tasques ajuda al procés de selecció i assignació. Adoptar les tècniques d’aprenentatge automàtic (conegut com a Machine Learning, o ML) és molt útil per dissenyar un mecanisme d’assignació de recursos avançat i sofisticat en el sistema habilitat per F2C. L’adopció i la realització de tècniques de ML en un sistema F2C és una tasca complexa. Especialment, la diversificació general i molts altres problemes plantegen un gran repte per realitzar amb èxit les tècniques de ML. Per tant, en aquesta recerca hem proposat i dissenyat dos possibles esquemes arquitectònics diferents per realitzar tècniques de ML en el sistema habilitat per F2C per aconseguir un mecanisme de gestió de recursos adaptatiu, avançat i sofisticat en un sistema F2C. Les nostres propostes són els primers passos per dissenyar un marc arquitectònic general per al mecanisme de gestió de recursos en un sistema habilitat per F2C.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC