Search CORE

1,949 research outputs found

Recommended from our members

Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments

Author: Naghshnejad Mina
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches

eScholarship - University of California

Robust Multimodal Failure Detection for Microservice Systems

Author: Feng Jiayi
Lin Qingwei
Ma Minghua
Pei Dan
Sun Yongqian
Tan Zhiyuan
Xiong Xiao
Yu LuLu
Zhang Dongmei
Zhang Shenglin
Zhang Yuzhi
Zhao Chenyu
Zhong Zhenyu
Publication venue
Publication date: 30/05/2023
Field of study

Proactive failure detection of instances is vitally essential to microservice systems because an instance failure can propagate to the whole system and degrade the system's performance. Over the years, many single-modal (i.e., metrics, logs, or traces) data-based nomaly detection methods have been proposed. However, they tend to miss a large number of failures and generate numerous false alarms because they ignore the correlation of multimodal data. In this work, we propose AnoFusion, an unsupervised failure detection approach, to proactively detect instance failures through multimodal data for microservice systems. It applies a Graph Transformer Network (GTN) to learn the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) with Gated Recurrent Unit (GRU) to address the challenges introduced by dynamically changing multimodal data. We evaluate the performance of AnoFusion through two datasets, demonstrating that it achieves the F1-score of 0.857 and 0.922, respectively, outperforming the state-of-the-art failure detection approaches

arXiv.org e-Print Archive

The 11th Conference of PhD Students in Computer Science

Author
Publication venue
Publication date: 01/01/2018
Field of study

University of Szeged

Network-aware video streaming for future media Internet

Author: Viola Roberto
Publication venue
Publication date: 26/10/2021
Field of study

272 p

Archivo Digital para la Docencia y la Investigación

Adaptation-Aware Architecture Modeling and Analysis of Energy Efficiency for Software Systems

Author: Stier Christian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2018
Field of study

This thesis presents an approach for the design time analysis of energy efficiency for static and self-adaptive software systems. The quality characteristics of a software system, such as performance and operating costs, strongly depend upon its architecture. Software architecture is a high-level view on software artifacts that reflects essential quality characteristics of a system under design. Design decisions made on an architectural level have a decisive impact on the quality of a system. Revising architectural design decisions late into development requires significant effort. Architectural analyses allow software architects to reason about the impact of design decisions on quality, based on an architectural description of the system. An essential quality goal is the reduction of cost while maintaining other quality goals. Power consumption accounts for a significant part of the Total Cost of Ownership (TCO) of data centers. In 2010, data centers contributed 1.3% of the world-wide power consumption. However, reasoning on the energy efficiency of software systems is excluded from the systematic analysis of software architectures at design time. Energy efficiency can only be evaluated once the system is deployed and operational. One approach to reduce power consumption or cost is the introduction of self-adaptivity to a software system. Self-adaptive software systems execute adaptations to provision costly resources dependent on user load. The execution of reconfigurations can increase energy efficiency and reduce cost. If performed improperly, however, the additional resources required to execute a reconfiguration may exceed their positive effect. Existing architecture-level energy analysis approaches offer limited accuracy or only consider a limited set of system features, e.g., the used communication style. Predictive approaches from the embedded systems and Cloud Computing domain operate on an abstraction that is not suited for architectural analysis. The execution of adaptations can consume additional resources. The additional consumption can reduce performance and energy efficiency. Design time quality analyses for self-adaptive software systems ignore this transient effect of adaptations. This thesis makes the following contributions to enable the systematic consideration of energy efficiency in the architectural design of self-adaptive software systems: First, it presents a modeling language that captures power consumption characteristics on an architectural abstraction level. Second, it introduces an energy efficiency analysis approach that uses instances of our power consumption modeling language in combination with existing performance analyses for architecture models. The developed analysis supports reasoning on energy efficiency for static and self-adaptive software systems. Third, to ease the specification of power consumption characteristics, we provide a method for extracting power models for server environments. The method encompasses an automated profiling of servers based on a set of restrictions defined by the user. A model training framework extracts a set of power models specified in our modeling language from the resulting profile. The method ranks the trained power models based on their predicted accuracy. Lastly, this thesis introduces a systematic modeling and analysis approach for considering transient effects in design time quality analyses. The approach explicitly models inter-dependencies between reconfigurations, performance and power consumption. We provide a formalization of the execution semantics of the model. Additionally, we discuss how our approach can be integrated with existing quality analyses of self-adaptive software systems. We validated the accuracy, applicability, and appropriateness of our approach in a variety of case studies. The first two case studies investigated the accuracy and appropriateness of our modeling and analysis approach. The first study evaluated the impact of design decisions on the energy efficiency of a media hosting application. The energy consumption predictions achieved an absolute error lower than 5.5% across different user loads. Our approach predicted the relative impact of the design decision on energy efficiency with an error of less than 18.94%. The second case study used two variants of the Spring-based community case study system PetClinic. The case study complements the accuracy and appropriateness evaluation of our modeling and analysis approach. We were able to predict the energy consumption of both variants with an absolute error of no more than 2.38%. In contrast to the first case study, we derived all models automatically, using our power model extraction framework, as well as an extraction framework for performance models. The third case study applied our model-based prediction to evaluate the effect of different self-adaptation algorithms on energy efficiency. It involved scientific workloads executed in a virtualized environment. Our approach predicted the energy consumption with an error below 7.1%, even though we used coarse grained measurement data of low accuracy to train the input models. The fourth case study evaluated the appropriateness and accuracy of the automated model extraction method using a set of Big Data and enterprise workloads. Our method produced power models with prediction errors below 5.9%. A secondary study evaluated the accuracy of extracted power models for different Virtual Machine (VM) migration scenarios. The results of the fifth case study showed that our approach for modeling transient effects improved the prediction accuracy for a horizontally scaling application. Leveraging the improved accuracy, we were able to identify design deficiencies of the application that otherwise would have remained unnoticed

KITopen

Towards edge robotics: the progress from cloud-based robotic systems to intelligent and context-aware robotic services

Author: Groshev Milan
Publication venue: 'Elsevier BV'
Publication date: 10/10/2022
Field of study

Current robotic systems handle a different range of applications such as video surveillance, delivery of goods, cleaning, material handling, assembly, painting, or pick and place services. These systems have been embraced not only by the general population but also by the vertical industries to help them in performing daily activities. Traditionally, the robotic systems have been deployed in standalone robots that were exclusively dedicated to performing a specific task such as cleaning the floor in indoor environments. In recent years, cloud providers started to offer their infrastructures to robotic systems for offloading some of the robot’s functions. This ultimate form of the distributed robotic system was first introduced 10 years ago as cloud robotics and nowadays a lot of robotic solutions are appearing in this form. As a result, standalone robots became software-enhanced objects with increased reconfigurability as well as decreased complexity and cost. Moreover, by offloading the heavy processing from the robot to the cloud, it is easier to share services and information from various robots or agents to achieve better cooperation and coordination. Cloud robotics is suitable for human-scale responsive and delay-tolerant robotic functionalities (e.g., monitoring, predictive maintenance). However, there is a whole set of real-time robotic applications (e.g., remote control, motion planning, autonomous navigation) that can not be executed with cloud robotics solutions, mainly because cloud facilities traditionally reside far away from the robots. While the cloud providers can ensure certain performance in their infrastructure, very little can be ensured in the network between the robots and the cloud, especially in the last hop where wireless radio access networks are involved. Over the last years advances in edge computing, fog computing, 5G NR, network slicing, Network Function Virtualization (NFV), and network orchestration are stimulating the interest of the industrial sector to satisfy the stringent and real-time requirements of their applications. Robotic systems are a key piece in the industrial digital transformation and their benefits are very well studied in the literature. However, designing and implementing a robotic system that integrates all the emerging technologies and meets the connectivity requirements (e.g., latency, reliability) is an ambitious task. This thesis studies the integration of modern Information andCommunication Technologies (ICTs) in robotic systems and proposes some robotic enhancements that tackle the real-time constraints of robotic services. To evaluate the performance of the proposed enhancements, this thesis departs from the design and prototype implementation of an edge native robotic system that embodies the concepts of edge computing, fog computing, orchestration, and virtualization. The proposed edge robotics system serves to represent two exemplary robotic applications. In particular, autonomous navigation of mobile robots and remote-control of robot manipulator where the end-to-end robotic system is distributed between the robots and the edge server. The open-source prototype implementation of the designed edge native robotic system resulted in the creation of two real-world testbeds that are used in this thesis as a baseline scenario for the evaluation of new innovative solutions in robotic systems. After detailing the design and prototype implementation of the end-to-end edge native robotic system, this thesis proposes several enhancements that can be offered to robotic systems by adapting the concept of edge computing via the Multi-Access Edge Computing (MEC) framework. First, it proposes exemplary network context-aware enhancements in which the real-time information about robot connectivity and location can be used to dynamically adapt the end-to-end system behavior to the actual status of the communication (e.g., radio channel). Three different exemplary context-aware enhancements are proposed that aim to optimize the end-to-end edge native robotic system. Later, the thesis studies the capability of the edge native robotic system to offer potential savings by means of computation offloading for robot manipulators in different deployment configurations. Further, the impact of different wireless channels (e.g., 5G, 4G andWi-Fi) to support the data exchange between a robot manipulator and its remote controller are assessed. In the following part of the thesis, the focus is set on how orchestration solutions can support mobile robot systems to make high quality decisions. The application of OKpi as an orchestration algorithm and DLT-based federation are studied to meet the KPIs that autonomously controlledmobile robots have in order to provide uninterrupted connectivity over the radio access network. The elaborated solutions present high compatibility with the designed edge robotics system where the robot driving range is extended without any interruption of the end-to-end edge robotics service. While the DLT-based federation extends the robot driving range by deploying access point extension on top of external domain infrastructure, OKpi selects the most suitable access point and computing resource in the cloud-to-thing continuum in order to fulfill the latency requirements of autonomously controlled mobile robots. To conclude the thesis the focus is set on how robotic systems can improve their performance by leveraging Artificial Intelligence (AI) and Machine Learning (ML) algorithms to generate smart decisions. To do so, the edge native robotic system is presented as a true embodiment of a Cyber-Physical System (CPS) in Industry 4.0, showing the mission of AI in such concept. It presents the key enabling technologies of the edge robotic system such as edge, fog, and 5G, where the physical processes are integrated with computing and network domains. The role of AI in each technology domain is identified by analyzing a set of AI agents at the application and infrastructure level. In the last part of the thesis, the movement prediction is selected to study the feasibility of applying a forecast-based recovery mechanism for real-time remote control of robotic manipulators (FoReCo) that uses ML to infer lost commands caused by interference in the wireless channel. The obtained results are showcasing the its potential in simulation and real-world experimentation.Programa de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Karl Holger.- Secretario: Joerg Widmer.- Vocal: Claudio Cicconett

Universidad Carlos III de Madrid e-Archivo

Prediction of big five personality traits from mobile application usage

Author: Sharmila P. (Parsa)
Publication venue: University of Oulu
Publication date: 23/06/2020
Field of study

Abstract. Smartphones evolved being an integral part of our daily lives and in recent days. Studies show that smartphone usage is correlated to user personality traits. This critical ecosystem is dependent on several variables such as geographic location, demographic traits, ethnic impact or cultural influence and so on. While significant number of demographic, environmental and medical analysis is done based on smartphone usage, there are inadequate amount of study carried out to analyse human personality. All of these information provide pivotal insights for improving user experience, creating recommendations, identifying marketing strategies and for a general overall usage improvement. This study is done with application usage data collected over 6 months from 739 Android smartphone users along with a 50-item Big Five Personality Trait questionnaire. The analysis focuses on the fact that, category-level aggregated application usage is enough for predicting Big Five personality traits achieving 9–14% error which is 86–91% accuracy on average. This study concludes that user personality generates a fundamental impact on smartphone application and application category usage. This work reflects the possible personality-driven research in future and depicts the significance and involvement of application categories in achieving proper accuracy in general traits, while pursuing for personality study

University of Oulu Repository - Jultika

Managing Distributed Cloud Applications and Infrastructure

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

The emergence of the Internet of Things (IoT), combined with greater heterogeneity not only online in cloud computing architectures but across the cloud-to-edge continuum, is introducing new challenges for managing applications and infrastructure across this continuum. The scale and complexity is simply so complex that it is no longer realistic for IT teams to manually foresee the potential issues and manage the dynamism and dependencies across an increasing inter-dependent chain of service provision. This Open Access Pivot explores these challenges and offers a solution for the intelligent and reliable management of physical infrastructure and the optimal placement of applications for the provision of services on distributed clouds. This book provides a conceptual reference model for reliable capacity provisioning for distributed clouds and discusses how data analytics and machine learning, application and infrastructure optimization, and simulation can deliver quality of service requirements cost-efficiently in this complex feature space. These are illustrated through a series of case studies in cloud computing, telecommunications, big data analytics, and smart cities

Directory of Open Access Books (DOAB)