39,959 research outputs found

    Performance-Aware Management of Cloud Resources: A Taxonomy and Future Directions

    Full text link
    Dynamic nature of the cloud environment has made distributed resource management process a challenge for cloud service providers. The importance of maintaining the quality of service in accordance with customer expectations as well as the highly dynamic nature of cloud-hosted applications add new levels of complexity to the process. Advances to the big data learning approaches have shifted conventional static capacity planning solutions to complex performance-aware resource management methods. It is shown that the process of decision making for resource adjustment is closely related to the behaviour of the system including the utilization of resources and application components. Therefore, a continuous monitoring of system attributes and performance metrics provide the raw data for the analysis of problems affecting the performance of the application. Data analytic methods such as statistical and machine learning approaches offer the required concepts, models and tools to dig into the data, find general rules, patterns and characteristics that define the functionality of the system. Obtained knowledge form the data analysis process helps to find out about the changes in the workloads, faulty components or problems that can cause system performance to degrade. A timely reaction to performance degradations can avoid violations of the service level agreements by performing proper corrective actions including auto-scaling or other resource adjustment solutions. In this paper, we investigate the main requirements and limitations in cloud resource management including a study of the approaches in workload and anomaly analysis in the context of the performance management in the cloud. A taxonomy of the works on this problem is presented which identifies the main approaches in existing researches from data analysis side to resource adjustment techniques

    Performance-oriented DevOps: A Research Agenda

    Full text link
    DevOps is a trend towards a tighter integration between development (Dev) and operations (Ops) teams. The need for such an integration is driven by the requirement to continuously adapt enterprise applications (EAs) to changes in the business environment. As of today, DevOps concepts have been primarily introduced to ensure a constant flow of features and bug fixes into new releases from a functional perspective. In order to integrate a non-functional perspective into these DevOps concepts this report focuses on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance. Performance describes system properties concerning its timeliness and use of resources. Common metrics are response time, throughput, and resource utilization. Performance goals for EAs are typically defined by setting upper and/or lower bounds for these metrics and specific business transactions. In order to ensure that such performance goals can be met, several activities are required during development and operation of these systems as well as during the transition from Dev to Ops. Activities during development are typically summarized by the term Software Performance Engineering (SPE), whereas activities during operations are called Application Performance Management (APM). SPE and APM were historically tackled independently from each other, but the newly emerging DevOps concepts require and enable a tighter integration between both activity streams. This report presents existing solutions to support this integration as well as open research challenges in this area

    Configuration Testing: Testing Configuration Values as Code and with Code

    Full text link
    This paper proposes configuration testing--evaluating configuration values (to be deployed) by exercising the code that uses the values and assessing the corresponding program behavior. We advocate that configuration values should be systematically tested like software code and that configuration testing should be a key reliability engineering practice for preventing misconfigurations from production deployment. The essential advantage of configuration testing is to put the configuration values (to be deployed) in the context of the target software program under test. In this way, the dynamic effects of configuration values and the impact of configuration changes can be observed during testing. Configuration testing overcomes the fundamental limitations of de facto approaches to combatting misconfigurations, namely configuration validation and software testing--the former is disconnected from code logic and semantics, while the latter can hardly cover all possible configuration values and their combinations. Our preliminary results show the effectiveness of configuration testing in capturing real-world misconfigurations. We present the principles of writing new configuration tests and the promises of retrofitting existing software tests to be configuration tests. We discuss new adequacy and quality metrics for configuration testing. We also explore regression testing techniques to enable incremental configuration testing during continuous integration and deployment in modern software systems

    From 4G to 5G: Self-organized Network Management meets Machine Learning

    Full text link
    In this paper, we provide an analysis of self-organized network management, with an end-to-end perspective of the network. Self-organization as applied to cellular networks is usually referred to Self-organizing Networks (SONs), and it is a key driver for improving Operations, Administration, and Maintenance (OAM) activities. SON aims at reducing the cost of installation and management of 4G and future 5G networks, by simplifying operational tasks through the capability to configure, optimize and heal itself. To satisfy 5G network management requirements, this autonomous management vision has to be extended to the end to end network. In literature and also in some instances of products available in the market, Machine Learning (ML) has been identified as the key tool to implement autonomous adaptability and take advantage of experience when making decisions. In this paper, we survey how network management can significantly benefit from ML solutions. We review and provide the basic concepts and taxonomy for SON, network management and ML. We analyse the available state of the art in the literature, standardization, and in the market. We pay special attention to 3rd Generation Partnership Project (3GPP) evolution in the area of network management and to the data that can be extracted from 3GPP networks, in order to gain knowledge and experience in how the network is working, and improve network performance in a proactive way. Finally, we go through the main challenges associated with this line of research, in both 4G and in what 5G is getting designed, while identifying new directions for research.Comment: 23 pages, 3 figures, Surve

    A Roadmap Towards Resilient Internet of Things for Cyber-Physical Systems

    Full text link
    The Internet of Things (IoT) is a ubiquitous system connecting many different devices - the things - which can be accessed from the distance. The cyber-physical systems (CPS) monitor and control the things from the distance. As a result, the concepts of dependability and security get deeply intertwined. The increasing level of dynamicity, heterogeneity, and complexity adds to the system's vulnerability, and challenges its ability to react to faults. This paper summarizes state-of-the-art of existing work on anomaly detection, fault-tolerance and self-healing, and adds a number of other methods applicable to achieve resilience in an IoT. We particularly focus on non-intrusive methods ensuring data integrity in the network. Furthermore, this paper presents the main challenges in building a resilient IoT for CPS which is crucial in the era of smart CPS with enhanced connectivity (an excellent example of such a system is connected autonomous vehicles). It further summarizes our solutions, work-in-progress and future work to this topic to enable "Trustworthy IoT for CPS". Finally, this framework is illustrated on a selected use case: A smart sensor infrastructure in the transport domain.Comment: preprint (2018-10-29

    An HCI View of Configuration Problems

    Full text link
    In recent years, configuration problems have drawn tremendous attention because of their increasing prevalence and their big impact on system availability. We believe that many of these problems are attributable to today's configuration interfaces that have not evolved to accommodate the enormous shift of the system administrator group. Plain text files, as the de facto configuration interfaces, assume administrators' understanding of the system under configuration. They ask administrators to directly edit the corresponding entries with little guidance or assistance. However, this assumption no longer holds for todays administrator group which has expanded greatly to include non- and semi-professional administrators. In this paper, we provide an HCI view of today's configuration problems, and articulate system configuration as a new HCI problem. Moreover, we present the top obstacles to correctly and efficiently configuring software systems, and most importantly their implications on the design and implementation of new-generation configuration interfaces.Comment: 9 pages of exploratory research on understanding system configuration problems using Human-Computer Interaction principle

    Data Management in Industry 4.0: State of the Art and Open Challenges

    Full text link
    Information and communication technologies are permeating all aspects of industrial and manufacturing systems, expediting the generation of large volumes of industrial data. This article surveys the recent literature on data management as it applies to networked industrial environments and identifies several open research challenges for the future. As a first step, we extract important data properties (volume, variety, traffic, criticality) and identify the corresponding data enabling technologies of diverse fundamental industrial use cases, based on practical applications. Secondly, we provide a detailed outline of recent industrial architectural designs with respect to their data management philosophy (data presence, data coordination, data computation) and the extent of their distributiveness. Then, we conduct a holistic survey of the recent literature from which we derive a taxonomy of the latest advances on industrial data enabling technologies and data centric services, spanning all the way from the field level deep in the physical deployments, up to the cloud and applications level. Finally, motivated by the rich conclusions of this critical analysis, we identify interesting open challenges for future research. The concepts presented in this article thematically cover the largest part of the industrial automation pyramid layers. Our approach is multidisciplinary, as the selected publications were drawn from two fields; the communications, networking and computation field as well as the industrial, manufacturing and automation field. The article can help the readers to deeply understand how data management is currently applied in networked industrial environments, and select interesting open research opportunities to pursue

    Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges

    Full text link
    Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our paper provides a comprehensive survey of the state-of-the-art in the assurance of ML, i.e. in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e. of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The paper begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research

    Mobile Cloud Business Process Management System for the Internet of Things: A Survey

    Full text link
    The Internet of Things (IoT) represents a comprehensive environment that consists of a large number of smart devices interconnecting heterogeneous physical objects to the Internet. Many domains such as logistics, manufacturing, agriculture, urban computing, home automation, ambient assisted living and various ubiquitous computing applications have utilised IoT technologies. Meanwhile, Business Process Management Systems (BPMS) have become a successful and efficient solution for coordinated management and optimised utilisation of resources/entities. However, past BPMS have not considered many issues they will face in managing large scale connected heterogeneous IoT entities. Without fully understanding the behaviour, capability and state of the IoT entities, the BPMS can fail to manage the IoT integrated information systems. In this paper, we analyse existing BPMS for IoT and identify the limitations and their drawbacks based on Mobile Cloud Computing perspective. Later, we discuss a number of open challenges in BPMS for IoT.Comment: 56 pages, 10 figures, 5 table

    Statically Verifying Continuous Integration Configurations

    Full text link
    Continuous Integration (CI) testing is a popular software development technique that allows developers to easily check that their code can build successfully and pass tests across various system environments. In order to use a CI platform, a developer must include a set of configuration files to a code repository for specifying build conditions. Incorrect configuration settings lead to CI build failures, which can take hours to run, wasting valuable developer time and delaying product release dates. Debugging CI configurations is challenging because users must manage configurations for the build across many system environments, to which they may not have local access. Thus, the only way to check a CI configuration is to push a commit and wait for the build result. To address this problem, we present the first approach, VeriCI, for statically checking for errors in a given CI configuration before the developer pushes a commit to build on the CI server. Our key insight is that the repositories in a CI environment contain lists of build histories which offer the time-aware repository build status. Driven by this insight, we introduce the Misclassification Guided Abstraction Refinement (MiGAR) loop that automates part of the learning process across the heterogeneous build environments in CI. We then use decision tree learning to generate constraints on the CI configuration that must hold for a build to succeed by training on a large history of continuous integration repository build results. We evaluate VeriCI on real-world data from GitHub and find that we have 83% accuracy of predicting a build failure