17 research outputs found

    ‎An Artificial Intelligence Framework for Supporting Coarse-Grained Workload Classification in Complex Virtual Environments

    Get PDF
    Cloud-based machine learning tools for enhanced Big Data applications}‎, ‎where the main idea is that of predicting the ``\emph{next}'' \emph{workload} occurring against the target Cloud infrastructure via an innovative \emph{ensemble-based approach} that combines the effectiveness of different well-known \emph{classifiers} in order to enhance the whole accuracy of the final classification‎, ‎which is very relevant at now in the specific context of \emph{Big Data}‎. ‎The so-called \emph{workload categorization problem} plays a critical role in improving the efficiency and reliability of Cloud-based big data applications‎. ‎Implementation-wise‎, ‎our method proposes deploying Cloud entities that participate in the distributed classification approach on top of \emph{virtual machines}‎, ‎which represent classical ``commodity'' settings for Cloud-based big data applications‎. ‎Given a number of known reference workloads‎, ‎and an unknown workload‎, ‎in this paper we deal with the problem of finding the reference workload which is most similar to the unknown one‎. ‎The depicted scenario turns out to be useful in a plethora of modern information system applications‎. ‎We name this problem as \emph{coarse-grained workload classification}‎, ‎because‎, ‎instead of characterizing the unknown workload in terms of finer behaviors‎, ‎such as CPU‎, ‎memory‎, ‎disk‎, ‎or network intensive patterns‎, ‎we classify the whole unknown workload as one of the (possible) reference workloads‎. ‎Reference workloads represent a category of workloads that are relevant in a given applicative environment‎. ‎In particular‎, ‎we focus our attention on the classification problem described above in the special case represented by \emph{virtualized environments}‎. ‎Today‎, ‎\emph{Virtual Machines} (VMs) have become very popular because they offer important advantages to modern computing environments such as cloud computing or server farms‎. ‎In virtualization frameworks‎, ‎workload classification is very useful for accounting‎, ‎security reasons‎, ‎or user profiling‎. ‎Hence‎, ‎our research makes more sense in such environments‎, ‎and it turns out to be very useful in a special context like Cloud Computing‎, ‎which is emerging now‎. ‎In this respect‎, ‎our approach consists of running several machine learning-based classifiers of different workload models‎, ‎and then deriving the best classifier produced by the \emph{Dempster-Shafer Fusion}‎, ‎in order to magnify the accuracy of the final classification‎. ‎Experimental assessment and analysis clearly confirm the benefits derived from our classification framework‎. ‎The running programs which produce unknown workloads to be classified are treated in a similar way‎. ‎A fundamental aspect of this paper concerns the successful use of data fusion in workload classification‎. ‎Different types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combination‎, ‎giving a classification accuracy of slightly less than 80%80\%‎. ‎The acquisition of data from the running process‎, ‎the pre-processing algorithms‎, ‎and the workload classification are described in detail‎. ‎Various classical algorithms have been used for classification to classify the workloads‎, ‎and the results are compared‎

    Towards Accurate Run-Time Hardware-Assisted Stealthy Malware Detection: A Lightweight, yet Effective Time Series CNN-Based Approach

    Get PDF
    According to recent security analysis reports, malicious software (a.k.a. malware) is rising at an alarming rate in numbers, complexity, and harmful purposes to compromise the security of modern computer systems. Recently, malware detection based on low-level hardware features (e.g., Hardware Performance Counters (HPCs) information) has emerged as an effective alternative solution to address the complexity and performance overheads of traditional software-based detection methods. Hardware-assisted Malware Detection (HMD) techniques depend on standard Machine Learning (ML) classifiers to detect signatures of malicious applications by monitoring built-in HPC registers during execution at run-time. Prior HMD methods though effective have limited their study on detecting malicious applications that are spawned as a separate thread during application execution, hence detecting stealthy malware patterns at run-time remains a critical challenge. Stealthy malware refers to harmful cyber attacks in which malicious code is hidden within benign applications and remains undetected by traditional malware detection approaches. In this paper, we first present a comprehensive review of recent advances in hardware-assisted malware detection studies that have used standard ML techniques to detect the malware signatures. Next, to address the challenge of stealthy malware detection at the processor’s hardware level, we propose StealthMiner, a novel specialized time series machine learning-based approach to accurately detect stealthy malware trace at run-time using branch instructions, the most prominent HPC feature. StealthMiner is based on a lightweight time series Fully Convolutional Neural Network (FCN) model that automatically identifies potentially contaminated samples in HPC-based time series data and utilizes them to accurately recognize the trace of stealthy malware. Our analysis demonstrates that using state-of-the-art ML-based malware detection methods is not effective in detecting stealthy malware samples since the captured HPC data not only represents malware but also carries benign applications’ microarchitectural data. The experimental results demonstrate that with the aid of our novel intelligent approach, stealthy malware can be detected at run-time with 94% detection performance on average with only one HPC feature, outperforming the detection performance of state-of-the-art HMD and general time series classification methods by up to 42% and 36%, respectively

    Analysing Edge Computing Devices for the Deployment of Embedded AI

    Get PDF
    In recent years, more and more devices are connected to the network, generating an overwhelming amount of data. This term that is booming today is known as the Internet of Things. In order to deal with these data close to the source, the term Edge Computing arises. The main objective is to address the limitations of cloud processing and satisfy the growing demand for applications and services that require low latency, greater efficiency and real-time response capabilities. Furthermore, it is essential to underscore the intrinsic connection between artificial intelligence and edge computing within the context of our study. This integral relationship not only addresses the challenges posed by data proliferation but also propels a transformative wave of innovation, shaping a new era of data processing capabilities at the network’s edge. Edge devices can perform real-time data analysis and make autonomous decisions without relying on constant connectivity to the cloud. This article aims at analysing and comparing Edge Computing devices when artificial intelligence algorithms are deployed on them. To this end, a detailed experiment involving various edge devices, models and metrics is conducted. In addition, we will observe how artificial intelligence accelerators such as Tensor Processing Unit behave. This analysis seeks to respond to the choice of a device that best suits the necessary AI requirements. As a summary, in general terms, the Jetson Nano provides the best performance when only CPU is used. Nevertheless the utilisation of a TPU drastically enhances the results.This work was partially financed by the Basque Government through their Elkartek program (SONETO project, ref. KK-2023/00038)

    Architecture for Enabling Edge Inference via Model Transfer from Cloud Domain in a Kubernetes Environment

    Get PDF
    The current approaches for energy consumption optimisation in buildings are mainly reactive or focus on scheduling of daily/weekly operation modes in heating. Machine Learning (ML)-based advanced control methods have been demonstrated to improve energy efficiency when compared to these traditional methods. However, placing of ML-based models close to the buildings is not straightforward. Firstly, edge-devices typically have lower capabilities in terms of processing power, memory, and storage, which may limit execution of ML-based inference at the edge. Secondly, associated building information should be kept private. Thirdly, network access may be limited for serving a large number of edge devices. The contribution of this paper is an architecture, which enables training of ML-based models for energy consumption prediction in private cloud domain, and transfer of the models to edge nodes for prediction in Kubernetes environment. Additionally, predictors at the edge nodes can be automatically updated without interrupting operation. Performance results with sensor-based devices (Raspberry Pi 4 and Jetson Nano) indicated that a satisfactory prediction latency (~7–9 s) can be achieved within the research context. However, model switching led to an increase in prediction latency (~9–13 s). Partial evaluation of a Reference Architecture for edge computing systems, which was used as a starting point for architecture design, may be considered as an additional contribution of the paper

    Statically analyzing the energy efficiency of software product lines

    Get PDF
    Optimizing software to become (more) energy efficient is an important concern for the software industry. Although several techniques have been proposed to measure energy consumption within software engineering, little work has specifically addressed Software Product Lines (SPLs). SPLs are a widely used software development approach, where the core concept is to study the systematic development of products that can be deployed in a variable way, e.g., to include different features for different clients. The traditional approach for measuring energy consumption in SPLs is to generate and individually measure all products, which, given their large number, is impractical. We present a technique, implemented in a tool, to statically estimate the worst-case energy consumption for SPLs. The goal is to reason about energy consumption in all products of a SPL, without having to individually analyze each product. Our technique combines static analysis and worst-case prediction with energy consumption analysis, in order to analyze products in a feature-sensitive manner: a feature that is used in several products is analyzed only once, while the energy consumption is estimated once per product. This paper describes not only our previous work on worst-case prediction, for comprehensibility, but also a significant extension of such work. This extension has been realized in two different axis: firstly, we incorporated in our methodology a simulated annealing algorithm to improve our worst-case energy consumption estimation. Secondly, we evaluated our new approach in four real-world SPLs, containing a total of 99 software products. Our new results show that our technique is able to estimate the worst-case energy consumption with a mean error percentage of 17.3% and standard deviation of 11.2%.This paper acknowledges the support of the Erasmus+ Key Action 2 (Strategic partnership for higher education) project No. 2020-1-PT01-KA203-078646: SusTrainable-Promoting Sustainability as a Fundamental Driver in Software Development Training and Education

    Vectorization system for unstructured codes with a Data-parallel Compiler IR

    Get PDF
    With Dennard Scaling coming to an end, Single Instruction Multiple Data (SIMD) oïŹ€ers itself as a way to improve the compute throughput of CPUs. One fundamental technique in SIMD code generators is the vectorization of data-parallel code regions. This has applications in outer-loop vectorization, whole-function vectorization and vectorization of explicitly data-parallel languages. This thesis makes contributions to the reliable vectorization of data-parallel code regions with unstructured, reducible control ïŹ‚ow. Reducibility is the case in practice where all control-ïŹ‚ow loops have exactly one entry point. We present P-LLVM, a novel, full-featured, intermediate representation for vectorizers that provides a semantics for the code region at every stage of the vectorization pipeline. Partial control-ïŹ‚ow linearization is a novel partial if-conversion scheme, an essential technique to vectorize divergent control ïŹ‚ow. DiïŹ€erent to prior techniques, partial linearization has linear running time, does not insert additional branches or blocks and gives proved guarantees on the control ïŹ‚ow retained. Divergence of control induces value divergence at join points in the control-ïŹ‚ow graph (CFG). We present a novel control-divergence analysis for directed acyclic graphs with optimal running time and prove that it is correct and precise under common static assumptions. We extend this technique to obtain a quadratic-time, control-divergence analysis for arbitrary reducible CFGs. For this analysis, we show on a range of realistic examples how earlier approaches are either less precise or incorrect. We present a feature-complete divergence analysis for P-LLVM programs. The analysis is the ïŹrst to analyze stack-allocated objects in an unstructured control setting. Finally, we generalize single-dimensional vectorization of outer loops to multi-dimensional tensorization of loop nests. SIMD targets beneïŹt from tensorization through more opportunities for re-use of loaded values and more eïŹƒcient memory access behavior. The techniques were implemented in the Region Vectorizer (RV) for vectorization and TensorRV for loop-nest tensorization. Our evaluation validates that the general-purpose RV vectorization system matches the performance of more specialized approaches. RV performs on par with the ISPC compiler, which only supports its structured domain-speciïŹc language, on a range of tree traversal codes with complex control ïŹ‚ow. RV is able to outperform the loop vectorizers of state-of-the-art compilers, as we show for the SPEC2017 nab_s benchmark and the XSBench proxy application.Mit dem Ausreizen des Dennard Scalings erreichen die gewohnten ZuwĂ€chse in der skalaren Rechenleistung zusehends ihr Ende. Moderne Prozessoren setzen verstĂ€rkt auf parallele Berechnung, um den Rechendurchsatz zu erhöhen. Hierbei spielen SIMD Instruktionen (Single Instruction Multiple Data), die eine Operation gleichzeitig auf mehrere Eingaben anwenden, eine zentrale Rolle. Eine fundamentale Technik, um SIMD Programmcode zu erzeugen, ist der Einsatz datenparalleler Vektorisierung. Diese unterliegt populĂ€ren Verfahren, wie der Vektorisierung Ă€ußerer Schleifen, der Vektorisierung gesamter Funktionen bis hin zu explizit datenparallelen Programmiersprachen. Der Beitrag der vorliegenden Arbeit besteht darin, ein zuverlĂ€ssiges Vektorisierungssystem fĂŒr datenparallelen Code mit reduziblem SteuerïŹ‚uss zu entwickeln. Diese Anforderung ist fĂŒr alle SteuerïŹ‚ussgraphen erfĂŒllt, deren Schleifen nur einen Eingang haben, was in der Praxis der Fall ist. Wir prĂ€sentieren P-LLVM, eine ausdrucksstarke Zwischendarstellung fĂŒr Vektorisierer, welche dem Programm in jedem Stadium der Transformation von datenparallelem Code zu SIMD Code eine deïŹnierte Semantik verleiht. Partielle SteuerïŹ‚uss-Linearisierung ist ein neuer Algorithmus zur If-Conversion, welcher SprĂŒnge erhalten kann. Anders als existierende Verfahren hat Partielle Linearisierung eine lineare Laufzeit und fĂŒgt keine neuen SprĂŒnge oder Blöcke ein. Wir zeigen Kriterien, unter denen der Algorithmus SteuerïŹ‚uss erhĂ€lt, und beweisen diese. SteuerïŹ‚ussdivergenz induziert Divergenz an Punkten zusammenïŹ‚ießenden SteuerïŹ‚usses. Wir stellen eine neue SteuerïŹ‚ussdivergenzanalyse fĂŒr azyklische Graphen mit optimaler Laufzeit vor und beweisen deren Korrektheit und PrĂ€zision. Wir verallgemeinern die Technik zu einem Algorithmus mit quadratischer Laufzeit fĂŒr beliebiege, reduzible SteuerïŹ‚ussgraphen. Eine Studie auf realistischen Beispielgraphen zeigt, dass vergleichbare Techniken entweder weniger prĂ€size sind oder falsche Ergebnisse liefern. Ebenfalls prĂ€sentieren wir eine Divergenzanalyse fĂŒr P-LLVM Programme. Diese Analyse ist die erste Divergenzanalyse, welche Divergenz in stapelallokierten Objekten unter unstrukturiertem SteuerïŹ‚uss analysiert. Schließlich generalisieren wir die eindimensionale Vektorisierung von Ă€ußeren Schleifen zur multidimensionalen Tensorisierung von Schleifennestern. Tensorisierung erĂ¶ïŹ€net fĂŒr SIMD Prozessoren mehr Möglichkeiten, bereits geladene Werte wiederzuverwenden und das SpeicherzugriïŹ€sverhalten des Programms zu optimieren, als dies mit Vektorisierung der Fall ist. Die vorgestellten Techniken wurden in den Region Vectorizer (RV) fĂŒr Vektorisierung und TensorRV fĂŒr die Tensorisierung von Schleifennestern implementiert. Wir zeigen auf einer Reihe von steuerïŹ‚usslastigen Programmen fĂŒr die Traversierung von Baumdatenstrukturen, dass RV das gleiche Niveau erreicht wie der ISPC Compiler, welcher nur seine strukturierte Eingabesprache verarbeiten kann. RV kann schnellere SIMD-Programme erzeugen als die Schleifenvektorisierer in aktuellen Industriecompilern. Dies demonstrieren wir mit dem nab_s benchmark aus der SPEC2017 Benchmarksuite und der XSBench Proxy-Anwendung

    Extensible Performance-Aware Runtime Integrity Measurement

    Get PDF
    Today\u27s interconnected world consists of a broad set of online activities including banking, shopping, managing health records, and social media while relying heavily on servers to manage extensive sets of data. However, stealthy rootkit attacks on this infrastructure have placed these servers at risk. Security researchers have proposed using an existing x86 CPU mode called System Management Mode (SMM) to search for rootkits from a hardware-protected, isolated, and privileged location. SMM has broad visibility into operating system resources including memory regions and CPU registers. However, the use of SMM for runtime integrity measurement mechanisms (SMM-RIMMs) would significantly expand the amount of CPU time spent away from operating system and hypervisor (host software) control, resulting in potentially serious system impacts. To be a candidate for production use, SMM RIMMs would need to be resilient, performant and extensible. We developed the EPA-RIMM architecture guided by the principles of extensibility, performance awareness, and effectiveness. EPA-RIMM incorporates a security check description mechanism that allows dynamic changes to the set of resources to be monitored. It minimizes system performance impacts by decomposing security checks into shorter tasks that can be independently scheduled over time. We present a performance methodology for SMM to quantify system impacts, as well as a simulator that allows for the evaluation of different methods of scheduling security inspections. Our SMM-based EPA-RIMM prototype leverages insights from the performance methodology to detect host software rootkits at reduced system impacts. EPA-RIMM demonstrates that SMM-based rootkit detection can be made performance-efficient and effective, providing a new tool for defense