999 research outputs found
Resource management in a containerized cloud : status and challenges
Cloud computing heavily relies on virtualization, as with cloud computing virtual resources are typically leased to the consumer, for example as virtual machines. Efficient management of these virtual resources is of great importance, as it has a direct impact on both the scalability and the operational costs of the cloud environment. Recently, containers are gaining popularity as virtualization technology, due to the minimal overhead compared to traditional virtual machines and the offered portability. Traditional resource management strategies however are typically designed for the allocation and migration of virtual machines, so the question arises how these strategies can be adapted for the management of a containerized cloud. Apart from this, the cloud is also no longer limited to the centrally hosted data center infrastructure. New deployment models have gained maturity, such as fog and mobile edge computing, bringing the cloud closer to the end user. These models could also benefit from container technology, as the newly introduced devices often have limited hardware resources. In this survey, we provide an overview of the current state of the art regarding resource management within the broad sense of cloud computing, complementary to existing surveys in literature. We investigate how research is adapting to the recent evolutions within the cloud, being the adoption of container technology and the introduction of the fog computing conceptual model. Furthermore, we identify several challenges and possible opportunities for future research
Context Aware Service Oriented Computing in Mobile Ad Hoc Networks
These days we witness a major shift towards small, mobile devices, capable of wireless communication. Their communication capabilities enable them to form mobile ad hoc networks and share resources and capabilities. Service Oriented Computing (SOC) is a new emerging paradigm for distributed computing that has evolved from object-oriented and component-oriented computing to enable applications distributed within and across organizational boundaries. Services are autonomous computational elements that can be described, published, discovered, and orchestrated for the purpose of developing applications. The application of the SOC model to mobile devices provides a loosely coupled model for distributed processing in a resource-poor and highly dynamic environment. Cooperation in a mobile ad hoc environment depends on the fundamental capability of hosts to communicate with each other. Peer-to-peer interactions among hosts within communication range allow such interactions but limit the scope of interactions to a local region. Routing algorithms for mobile ad hoc networks extend the scope of interactions to cover all hosts transitively connected over multi-hop routes. Additional contextual information, e.g., knowledge about the movement of hosts in physical space, can help extend the boundaries of interactions beyond the limits of an island of connectivity. To help separate concerns specific to different layers, a coordination model between the routing layer and the SOC layer provides abstractions that mask the details characteristic to the network layer from the distributed computing semantics above. This thesis explores some of the opportunities and challenges raised by applying the SOC paradigm to mobile computing in ad hoc networks. It investigates the implications of disconnections on service advertising and discovery mechanisms. It addresses issues related to code migration in addition to physical host movement. It also investigates some of the security concerns in ad hoc networking service provision. It presents a novel routing algorithm for mobile ad hoc networks and a novel coordination model that addresses space and time explicitly
Reliable and energy efficient resource provisioning in cloud computing systems
Cloud Computing has revolutionized the Information Technology sector by giving computing a perspective of service. The services of cloud computing can be accessed by users not knowing about the underlying system with easy-to-use portals. To provide such an abstract view, cloud computing systems have to perform many complex operations besides managing a large underlying infrastructure. Such complex operations confront service providers with many challenges such as security, sustainability, reliability, energy consumption and resource management. Among all the challenges, reliability and energy consumption are two key challenges focused on in this thesis because of their conflicting nature. Current solutions either focused on reliability techniques or energy efficiency methods. But it has been observed that mechanisms providing reliability in cloud computing systems can deteriorate the energy consumption. Adding backup resources and running replicated systems provide strong fault tolerance but also increase energy consumption. Reducing energy consumption by running resources on low power scaling levels or by reducing the number of active but idle sitting resources such as backup resources reduces the system reliability. This creates a critical trade-off between these two metrics that are investigated in this thesis. To address this problem, this thesis presents novel resource management policies which target the provisioning of best resources in terms of reliability and energy efficiency and allocate them to suitable virtual machines. A mathematical framework showing interplay between reliability and energy consumption is also proposed in this thesis. A formal method to calculate the finishing time of tasks running in a cloud computing environment impacted with independent and correlated failures is also provided. The proposed policies adopted various fault tolerance mechanisms while satisfying the constraints such as task deadlines and utility values. This thesis also provides a novel failure-aware VM consolidation method, which takes the failure characteristics of resources into consideration before performing VM consolidation. All the proposed resource management methods are evaluated by using real failure traces collected from various distributed computing sites. In order to perform the evaluation, a cloud computing framework, 'ReliableCloudSim' capable of simulating failure-prone cloud computing systems is developed. The key research findings and contributions of this thesis are: 1. If the emphasis is given only to energy optimization without considering reliability in a failure prone cloud computing environment, the results can be contrary to the intuitive expectations. Rather than reducing energy consumption, a system ends up consuming more energy due to the energy losses incurred because of failure overheads. 2. While performing VM consolidation in a failure prone cloud computing environment, a significant improvement in terms of energy efficiency and reliability can be achieved by considering failure characteristics of physical resources. 3. By considering correlated occurrence of failures during resource provisioning and VM allocation, the service downtime or interruption is reduced significantly by 34% in comparison to the environments with the assumption of independent occurrence of failures. Moreover, measured by our mathematical model, the ratio of reliability and energy consumption is improved by 14%
Power Analysis and Optimization Techniques for Energy Efficient Computer Systems
Reducing power consumption has become a major challenge in the design and operation of to-day’s computer systems. This chapter describes different techniques addressing this challenge at different levels of system hardware, such as CPU, memory, and internal interconnection network, as well as at different levels of software components, such as compiler, operating system and user applications. These techniques can be broadly categorized into two types: Design time power analysis versus run-time dynamic power management. Mechanisms in the first category use ana-lytical energy models that are integrated into existing simulators to measure the system’s power consumption and thus help engineers to test power-conscious hardware and software during de-sign time. On the other hand, dynamic power management techniques are applied during run-time, and are used to monitor system workload and adapt the system’s behavior dynamically to save energy
Resource management for extreme scale high performance computing systems in the presence of failures
2018 Summer.Includes bibliographical references.High performance computing (HPC) systems, such as data centers and supercomputers, coordinate the execution of large-scale computation of applications over tens or hundreds of thousands of multicore processors. Unfortunately, as the size of HPC systems continues to grow towards exascale complexities, these systems experience an exponential growth in the number of failures occurring in the system. These failures reduce performance and increase energy use, reducing the efficiency and effectiveness of emerging extreme-scale HPC systems. Applications executing in parallel on individual multicore processors also suffer from decreased performance and increased energy use as a result of applications being forced to share resources, in particular, the contention from multiple application threads sharing the last-level cache causes performance degradation. These challenges make it increasingly important to characterize and optimize the performance and behavior of applications that execute in these systems. To address these challenges, in this dissertation we propose a framework for intelligently characterizing and managing extreme-scale HPC system resources. We devise various techniques to mitigate the negative effects of failures and resource contention in HPC systems. In particular, we develop new HPC resource management techniques for intelligently utilizing system resources through the (a) optimal scheduling of applications to HPC nodes and (b) the optimal configuration of fault resilience protocols. These resource management techniques employ information obtained from historical analysis as well as theoretical and machine learning methods for predictions. We use these data to characterize system performance, energy use, and application behavior when operating under the uncertainty of performance degradation from both system failures and resource contention. We investigate how to better characterize and model the negative effects from system failures as well as application co-location on large-scale HPC computing systems. Our analysis of application and system behavior also investigates: the interrelated effects of network usage of applications and fault resilience protocols; checkpoint interval selection and its sensitivity to system parameters for various checkpoint-based fault resilience protocols; and performance comparisons of various promising strategies for fault resilience in exascale-sized systems
Recommended from our members
Mobile computing in a clouded environment
textCloud Computing has started to become a viable option for computing centers and mobile consumers seeking to reduce cost overhead, power consumption, and increase software services available within their platform. For instance distributed memory constrained mobile devices can expand their ability to share real time data by utilizing virtual memory located within the cloud. Cloud memory services can be configured to restrict read and write access to the shared memory pool on a partner by partner basis. Utilization of such resources in turn reduces hardware requirements on mobile devices while lessening power consumption for each physical resource.
Within the Cloud Computing paradigm, computing resources are provisioned to consumers on demand and guaranteed through service level agreements. Although the
idea of a computing utility is not new, its realization has come to pass as researchers and corporate companies embark on a journey of implementing highly scalable cloud environments. As new solutions and architectures are proposed, additional use cases and consumer concerns have been revealed. These issues range from consumer security, adequate service level agreements and vendor interoperability, to cloud technology standardizations. Further, the current state of the art does not adequately address these needs for mobile consumers, where services need to be guaranteed even as consumers dynamically change locations. Due to the rapid adoption of virtualization stacks and the dramatic increase of mobile computing devices, cloud providers must be able to handle logical and physical mobility of consumers. As consumers move throughout geographical regions, there exists the probability that a consumer’s new locale may hinder a producer’s ability to uphold service level agreements. This inability is due to the fact that a producer may not have physical resources located relatively close to a mobile consumer’s new locale. As a consequence, producers must either continue to provide degraded resource consumption or migrate workloads to third party producers in order to ensure service level agreements are maintained. The goal of this report is to research existing architectures that provide the ability to adequately uphold service level agreements as mobile consumers move from locale to locale. Further we propose an architecture that can be implemented along with existing solutions in order to ensure consumers receive adequate service levels regardless of locality. We believe this architecture will lead to increased cloud interoperability and decreased consumer to producer platform coupling.Electrical and Computer Engineerin
Autonomic Approach based on Semantics and Checkpointing for IoT System Management
Le résumé en français n'a pas été communiqué par l'auteur.Le résumé en anglais n'a pas été communiqué par l'auteur
- …