1,949 research outputs found
Recommended from our members
Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments
As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches
Robust Multimodal Failure Detection for Microservice Systems
Proactive failure detection of instances is vitally essential to microservice
systems because an instance failure can propagate to the whole system and
degrade the system's performance. Over the years, many single-modal (i.e.,
metrics, logs, or traces) data-based nomaly detection methods have been
proposed. However, they tend to miss a large number of failures and generate
numerous false alarms because they ignore the correlation of multimodal data.
In this work, we propose AnoFusion, an unsupervised failure detection approach,
to proactively detect instance failures through multimodal data for
microservice systems. It applies a Graph Transformer Network (GTN) to learn the
correlation of the heterogeneous multimodal data and integrates a Graph
Attention Network (GAT) with Gated Recurrent Unit (GRU) to address the
challenges introduced by dynamically changing multimodal data. We evaluate the
performance of AnoFusion through two datasets, demonstrating that it achieves
the F1-score of 0.857 and 0.922, respectively, outperforming the
state-of-the-art failure detection approaches
Adaptation-Aware Architecture Modeling and Analysis of Energy Efficiency for Software Systems
This thesis presents an approach for the design time analysis of energy efficiency for static and self-adaptive software systems.
The quality characteristics of a software system, such as performance and operating costs, strongly depend upon its architecture. Software architecture is a high-level view on software artifacts that reflects essential quality characteristics of a system under design. Design decisions made on an architectural level have a decisive impact on the quality of a system. Revising architectural design decisions late into development requires significant effort. Architectural analyses allow software architects to reason about the impact of design decisions on quality, based on an architectural description of the system. An essential quality goal is the reduction of cost while maintaining other quality goals. Power consumption accounts for a significant part of the Total Cost of Ownership (TCO) of data centers. In 2010, data centers contributed 1.3% of the world-wide power consumption. However, reasoning on the energy efficiency of software systems is excluded from the systematic analysis of software architectures at design time. Energy efficiency can only be evaluated once the system is deployed and operational. One approach to reduce power consumption or cost is the introduction of self-adaptivity to a software system. Self-adaptive software systems execute adaptations to provision costly resources dependent on user load. The execution of reconfigurations can increase energy efficiency and reduce cost. If performed improperly, however, the additional resources required to execute a reconfiguration may exceed their positive effect.
Existing architecture-level energy analysis approaches offer limited accuracy or only consider a limited set of system features, e.g., the used communication style. Predictive approaches from the embedded systems and Cloud Computing domain operate on an abstraction that is not suited for architectural analysis. The execution of adaptations can consume additional resources. The additional consumption can reduce performance and energy efficiency. Design time quality analyses for self-adaptive software systems ignore this transient effect of adaptations.
This thesis makes the following contributions to enable the systematic consideration of energy efficiency in the architectural design of self-adaptive software systems: First, it presents a modeling language that captures power consumption characteristics on an architectural abstraction level. Second, it introduces an energy efficiency analysis approach that uses instances of our power consumption modeling language in combination with existing performance analyses for architecture models. The developed analysis supports reasoning on energy efficiency for static and self-adaptive software systems. Third, to ease the specification of power consumption characteristics, we provide a method for extracting power models for server environments. The method encompasses an automated profiling of servers based on a set of restrictions defined by the user. A model training framework extracts a set of power models specified in our modeling language from the resulting profile. The method ranks the trained power models based on their predicted accuracy. Lastly, this thesis introduces a systematic modeling and analysis approach for considering transient effects in design time quality analyses. The approach explicitly models inter-dependencies between reconfigurations, performance and power consumption. We provide a formalization of the execution semantics of the model. Additionally, we discuss how our approach can be integrated with existing quality analyses of self-adaptive software systems.
We validated the accuracy, applicability, and appropriateness of our approach in a variety of case studies. The first two case studies investigated the accuracy and appropriateness of our modeling and analysis approach. The first study evaluated the impact of design decisions on the energy efficiency of a media hosting application. The energy consumption predictions achieved an absolute error lower than 5.5% across different user loads. Our approach predicted the relative impact of the design decision on energy efficiency with an error of less than 18.94%. The second case study used two variants of the Spring-based community case study system PetClinic. The case study complements the accuracy and appropriateness evaluation of our modeling and analysis approach. We were able to predict the energy consumption of both variants with an absolute error of no more than 2.38%. In contrast to the first case study, we derived all models automatically, using our power model extraction framework, as well as an extraction framework for performance models. The third case study applied our model-based prediction to evaluate the effect of different self-adaptation algorithms on energy efficiency. It involved scientific workloads executed in a virtualized environment. Our approach predicted the energy consumption with an error below 7.1%, even though we used coarse grained measurement data of low accuracy to train the input models. The fourth case study evaluated the appropriateness and accuracy of the automated model extraction method using a set of Big Data and enterprise workloads. Our method produced power models with prediction errors below 5.9%. A secondary study evaluated the accuracy of extracted power models for different Virtual Machine (VM) migration scenarios. The results of the fifth case study showed that our approach for modeling transient effects improved the prediction accuracy for a horizontally scaling application. Leveraging the improved accuracy, we were able to identify design deficiencies of the application that otherwise would have remained unnoticed
Towards edge robotics: the progress from cloud-based robotic systems to intelligent and context-aware robotic services
Current robotic systems handle a different range of applications such as video surveillance, delivery
of goods, cleaning, material handling, assembly, painting, or pick and place services. These systems
have been embraced not only by the general population but also by the vertical industries to
help them in performing daily activities. Traditionally, the robotic systems have been deployed in
standalone robots that were exclusively dedicated to performing a specific task such as cleaning the
floor in indoor environments. In recent years, cloud providers started to offer their infrastructures
to robotic systems for offloading some of the robot’s functions. This ultimate form of the distributed
robotic system was first introduced 10 years ago as cloud robotics and nowadays a lot of robotic solutions
are appearing in this form. As a result, standalone robots became software-enhanced objects
with increased reconfigurability as well as decreased complexity and cost. Moreover, by offloading
the heavy processing from the robot to the cloud, it is easier to share services and information from
various robots or agents to achieve better cooperation and coordination.
Cloud robotics is suitable for human-scale responsive and delay-tolerant robotic functionalities
(e.g., monitoring, predictive maintenance). However, there is a whole set of real-time robotic applications
(e.g., remote control, motion planning, autonomous navigation) that can not be executed with
cloud robotics solutions, mainly because cloud facilities traditionally reside far away from the robots.
While the cloud providers can ensure certain performance in their infrastructure, very little can be
ensured in the network between the robots and the cloud, especially in the last hop where wireless
radio access networks are involved. Over the last years advances in edge computing, fog computing,
5G NR, network slicing, Network Function Virtualization (NFV), and network orchestration are stimulating
the interest of the industrial sector to satisfy the stringent and real-time requirements of their
applications. Robotic systems are a key piece in the industrial digital transformation and their benefits
are very well studied in the literature. However, designing and implementing a robotic system
that integrates all the emerging technologies and meets the connectivity requirements (e.g., latency,
reliability) is an ambitious task.
This thesis studies the integration of modern Information andCommunication Technologies (ICTs)
in robotic systems and proposes some robotic enhancements that tackle the real-time constraints of
robotic services. To evaluate the performance of the proposed enhancements, this thesis departs
from the design and prototype implementation of an edge native robotic system that embodies the concepts of edge computing, fog computing, orchestration, and virtualization. The proposed edge
robotics system serves to represent two exemplary robotic applications. In particular, autonomous
navigation of mobile robots and remote-control of robot manipulator where the end-to-end robotic
system is distributed between the robots and the edge server. The open-source prototype implementation
of the designed edge native robotic system resulted in the creation of two real-world testbeds
that are used in this thesis as a baseline scenario for the evaluation of new innovative solutions in
robotic systems.
After detailing the design and prototype implementation of the end-to-end edge native robotic
system, this thesis proposes several enhancements that can be offered to robotic systems by adapting
the concept of edge computing via the Multi-Access Edge Computing (MEC) framework. First, it
proposes exemplary network context-aware enhancements in which the real-time information about
robot connectivity and location can be used to dynamically adapt the end-to-end system behavior to
the actual status of the communication (e.g., radio channel). Three different exemplary context-aware
enhancements are proposed that aim to optimize the end-to-end edge native robotic system. Later,
the thesis studies the capability of the edge native robotic system to offer potential savings by means of
computation offloading for robot manipulators in different deployment configurations. Further, the
impact of different wireless channels (e.g., 5G, 4G andWi-Fi) to support the data exchange between a
robot manipulator and its remote controller are assessed.
In the following part of the thesis, the focus is set on how orchestration solutions can support
mobile robot systems to make high quality decisions. The application of OKpi as an orchestration algorithm
and DLT-based federation are studied to meet the KPIs that autonomously controlledmobile
robots have in order to provide uninterrupted connectivity over the radio access network. The elaborated
solutions present high compatibility with the designed edge robotics system where the robot
driving range is extended without any interruption of the end-to-end edge robotics service. While the
DLT-based federation extends the robot driving range by deploying access point extension on top of
external domain infrastructure, OKpi selects the most suitable access point and computing resource
in the cloud-to-thing continuum in order to fulfill the latency requirements of autonomously controlled
mobile robots.
To conclude the thesis the focus is set on how robotic systems can improve their performance by
leveraging Artificial Intelligence (AI) and Machine Learning (ML) algorithms to generate smart decisions.
To do so, the edge native robotic system is presented as a true embodiment of a Cyber-Physical
System (CPS) in Industry 4.0, showing the mission of AI in such concept. It presents the key enabling
technologies of the edge robotic system such as edge, fog, and 5G, where the physical processes are
integrated with computing and network domains. The role of AI in each technology domain is identified
by analyzing a set of AI agents at the application and infrastructure level. In the last part of the
thesis, the movement prediction is selected to study the feasibility of applying a forecast-based recovery
mechanism for real-time remote control of robotic manipulators (FoReCo) that uses ML to infer
lost commands caused by interference in the wireless channel. The obtained results are showcasing
the its potential in simulation and real-world experimentation.Programa de Doctorado en IngenierÃa Telemática por la Universidad Carlos III de MadridPresidente: Karl Holger.- Secretario: Joerg Widmer.- Vocal: Claudio Cicconett
Prediction of big five personality traits from mobile application usage
Abstract. Smartphones evolved being an integral part of our daily lives and in recent days. Studies show that smartphone usage is correlated to user personality traits. This critical ecosystem is dependent on several variables such as geographic location, demographic traits, ethnic impact or cultural influence and so on. While significant number of demographic, environmental and medical analysis is done based on smartphone usage, there are inadequate amount of study carried out to analyse human personality. All of these information provide pivotal insights for improving user experience, creating recommendations, identifying marketing strategies and for a general overall usage improvement. This study is done with application usage data collected over 6 months from 739 Android smartphone users along with a 50-item Big Five Personality Trait questionnaire. The analysis focuses on the fact that, category-level aggregated application usage is enough for predicting Big Five personality traits achieving 9–14% error which is 86–91% accuracy on average. This study concludes that user personality generates a fundamental impact on smartphone application and application category usage. This work reflects the possible personality-driven research in future and depicts the significance and involvement of application categories in achieving proper accuracy in general traits, while pursuing for personality study
Managing Distributed Cloud Applications and Infrastructure
The emergence of the Internet of Things (IoT), combined with greater heterogeneity not only online in cloud computing architectures but across the cloud-to-edge continuum, is introducing new challenges for managing applications and infrastructure across this continuum. The scale and complexity is simply so complex that it is no longer realistic for IT teams to manually foresee the potential issues and manage the dynamism and dependencies across an increasing inter-dependent chain of service provision. This Open Access Pivot explores these challenges and offers a solution for the intelligent and reliable management of physical infrastructure and the optimal placement of applications for the provision of services on distributed clouds. This book provides a conceptual reference model for reliable capacity provisioning for distributed clouds and discusses how data analytics and machine learning, application and infrastructure optimization, and simulation can deliver quality of service requirements cost-efficiently in this complex feature space. These are illustrated through a series of case studies in cloud computing, telecommunications, big data analytics, and smart cities
- …