689 research outputs found
Energy and Performance: Management of Virtual Machines: Provisioning, Placement, and Consolidation
Cloud computing is a new computing paradigm that offers scalable storage and compute resources to users on demand through Internet. Public cloud providers operate large-scale data centers around the world to handle a large number of users request. However, data centers consume an immense amount of electrical energy that can lead to high operating costs and carbon emissions. One of the most common and effective method in order to reduce energy consumption is Dynamic Virtual Machines Consolidation (DVMC) enabled by the virtualization technology. DVMC dynamically consolidates Virtual Machines (VMs) into the minimum number of active servers and then switches the idle servers into a power-saving mode to save energy. However, maintaining the desired level of Quality-of-Service (QoS) between data centers and their users is critical for satisfying users’ expectations concerning performance. Therefore, the main challenge is to minimize the data center energy consumption while maintaining the required QoS.
This thesis address this challenge by presenting novel DVMC approaches to reduce the energy consumption of data centers and improve resource utilization under workload independent quality of service constraints. These approaches can be divided into three main categories: heuristic, meta-heuristic and machine learning.
Our first contribution is a heuristic algorithm for solving the DVMC problem. The algorithm uses a linear regression-based prediction model to detect over-loaded servers based on the historical utilization data. Then it migrates some VMs from the over-loaded servers to avoid further performance degradations. Moreover, our algorithm consolidates VMs on fewer number of server for energy saving. The second and third contributions are two novel DVMC algorithms based on the Reinforcement Learning (RL) approach. RL is interesting for highly adaptive and autonomous management in dynamic environments. For this reason, we use RL to solve two main sub-problems in VM consolidation. The first sub-problem is the server power mode detection (sleep or active). The second sub-problem is to find an effective solution for server status detection (overloaded or non-overloaded). The fourth contribution of this thesis is an online optimization meta-heuristic algorithm called Ant Colony System-based Placement Optimization (ACS-PO). ACS is a suitable approach for VM consolidation due to the ease of parallelization, that it is close to the optimal solution, and its polynomial worst-case time complexity. The simulation results show that ACS-PO provides substantial improvement over other heuristic algorithms in reducing energy consumption, the number of VM migrations, and performance degradations.
Our fifth contribution is a Hierarchical VM management (HiVM) architecture based on a three-tier data center topology which is very common use in data centers. HiVM has the ability to scale across many thousands of servers with energy efficiency. Our sixth contribution is a Utilization Prediction-aware Best Fit Decreasing (UP-BFD) algorithm. UP-BFD can avoid SLA violations and needless migrations by taking into consideration the current and predicted future resource requirements for allocation, consolidation, and placement of VMs.
Finally, the seventh and the last contribution is a novel Self-Adaptive Resource Management System (SARMS) in data centers. To achieve scalability, SARMS uses a hierarchical architecture that is partially inspired from HiVM. Moreover, SARMS provides self-adaptive ability for resource management by dynamically adjusting the utilization thresholds for each server in data centers.Siirretty Doriast
A Process Framework for Managing Quality of Service in Private Cloud
As information systems leaders tap into the global market of cloud computing-based services, they struggle to maintain consistent application performance due to lack of a process framework for managing quality of service (QoS) in the cloud. Guided by the disruptive innovation theory, the purpose of this case study was to identify a process framework for meeting the QoS requirements of private cloud service users. Private cloud implementation was explored by selecting an organization in California through purposeful sampling. Information was gathered by interviewing 23 information technology (IT) professionals, a mix of frontline engineers, managers, and leaders involved in the implementation of private cloud. Another source of data was documents such as standard operating procedures, policies, and guidelines related to private cloud implementation. Interview transcripts and documents were coded and sequentially analyzed. Three prominent themes emerged from the analysis of data: (a) end user expectations, (b) application architecture, and (c) trending analysis. The findings of this study may help IT leaders in effectively managing QoS in cloud infrastructure and deliver reliable application performance that may help in increasing customer population and profitability of organizations. This study may contribute to positive social change as information systems managers and workers can learn and apply the process framework for delivering stable and reliable cloud-hosted computer applications
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Recommended from our members
Transiency-driven Resource Management for Cloud Computing Platforms
Modern distributed server applications are hosted on enterprise or cloud data centers that provide computing, storage, and networking capabilities to these applications. These applications are built using the implicit assumption that the underlying servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources, and cloud platforms that provide lower-cost transient servers which can be unilaterally revoked by the cloud operator.
Transient computing resources are increasingly important, and existing fault-tolerance and resource management techniques are inadequate for transient servers because applications typically assume continuous resource availability. This thesis presents research in distributed systems design that treats transiency as a first-class design principle. I show that combining transiency-specific fault-tolerance mechanisms with resource management policies to suit application characteristics and requirements, can yield significant cost and performance benefits. These mechanisms and policies have been implemented and prototyped as part of software systems, which allow a wide range of applications, such as interactive services and distributed data processing, to be deployed on transient servers, and can reduce cloud computing costs by up to 90\%.
This thesis makes contributions to four areas of computer systems research: transiency-specific fault-tolerance, resource allocation, abstractions, and resource reclamation. For reducing the impact of transient server revocations, I develop two fault-tolerance techniques that are tailored to transient server characteristics and application requirements. For interactive applications, I build a derivative cloud platform that masks revocations by transparently moving application-state between servers of different types. Similarly, for distributed data processing applications, I investigate the use of application level periodic checkpointing to reduce the performance impact of server revocations. For managing and reducing the risk of server revocations, I investigate the use of server portfolios that allow transient resource allocation to be tailored to application requirements.
Finally, I investigate how resource providers (such as cloud platforms) can provide transient resource availability without revocation, by looking into alternative resource reclamation techniques. I develop resource deflation, wherein a server\u27s resources are fractionally reclaimed, allowing the application to continue execution albeit with fewer resources. Resource deflation generalizes revocation, and the deflation mechanisms and cluster-wide policies can yield both high cluster utilization and low application performance degradation
Security risk assessment in cloud computing domains
Cyber security is one of the primary concerns persistent across any computing platform. While addressing the apprehensions about security risks, an infinite amount of resources cannot be invested in mitigation measures since organizations operate under budgetary constraints. Therefore the task of performing security risk assessment is imperative to designing optimal mitigation measures, as it provides insight about the strengths and weaknesses of different assets affiliated to a computing platform.
The objective of the research presented in this dissertation is to improve upon existing risk assessment frameworks and guidelines associated to different key assets of Cloud computing domains - infrastructure, applications, and users. The dissertation presents various informal approaches of performing security risk assessment which will help to identify the security risks confronted by the aforementioned assets, and utilize the results to carry out the required cost-benefit tradeoff analyses. This will be beneficial to organizations by aiding them in better comprehending the security risks their assets are exposed to and thereafter secure them by designing cost-optimal mitigation measures --Abstract, page iv
Cloud-computing strategies for sustainable ICT utilization : a decision-making framework for non-expert Smart Building managers
Virtualization of processing power, storage, and networking applications via cloud-computing allows Smart Buildings to operate heavy demand computing resources off-premises. While this approach reduces in-house costs and energy use, recent case-studies have highlighted complexities in decision-making processes associated with implementing the concept of cloud-computing. This complexity is due to the rapid evolution of these technologies without standardization of approach by those organizations offering cloud-computing provision as a commercial concern. This study defines the term Smart Building as an ICT environment where a degree of system integration is accomplished. Non-expert managers are highlighted as key users of the outcomes from this project given the diverse nature of Smart Buildings’ operational objectives. This research evaluates different ICT management methods to effectively support decisions made by non-expert clients to deploy different models of cloud-computing services in their Smart Buildings ICT environments. The objective of this study is to reduce the need for costly 3rd party ICT consultancy providers, so non-experts can focus more on their Smart Buildings’ core competencies rather than the complex, expensive, and energy consuming processes of ICT management.
The gap identified by this research represents vulnerability for non-expert managers to make effective decisions regarding cloud-computing cost estimation, deployment assessment, associated power consumption, and management flexibility in their Smart Buildings ICT environments. The project analyses cloud-computing decision-making concepts with reference to different Smart Building ICT attributes. In particular, it focuses on a structured programme of data collection which is achieved through semi-structured interviews, cost simulations and risk-analysis surveys. The main output is a theoretical management framework for non-expert decision-makers across variously-operated Smart Buildings. Furthermore, a decision-support tool is designed to enable non-expert managers to identify the extent of virtualization potential by evaluating different implementation options. This is presented to correlate with contract limitations, security challenges, system integration levels, sustainability, and long-term costs. These requirements are explored in contrast to cloud demand changes observed across specified periods. Dependencies were identified to greatly vary depending on numerous organizational aspects such as performance, size, and workload. The study argues that constructing long-term, sustainable, and cost-efficient strategies for any cloud deployment, depends on the thorough identification of required services off and on-premises. It points out that most of today’s heavy-burdened Smart Buildings are outsourcing these services to costly independent suppliers, which causes unnecessary management complexities, additional cost, and system incompatibility. The main conclusions argue that cloud-computing cost can differ depending on the Smart Building attributes and ICT requirements, and although in most cases cloud services are more convenient and cost effective at the early stages of the deployment and migration process, it can become costly in the future if not planned carefully using cost estimation service patterns. The results of the study can be exploited to enhance core competencies within Smart Buildings in order to maximize growth and attract new business opportunities
- …