56 research outputs found

    An analysis of software aging in cloud environment

    Get PDF
    Cloud Computing is the environment in which several virtual machines (VM) run concurrently on physical machines. The cloud computing infrastructure hosts multiple cloud service segments that communicate with each other using the interfaces. This creates distributed computing environment. During operation, the software systems accumulate errors or garbage that leads to system failure and other hazardous consequences. This status is called software aging. Software aging happens because of memory fragmentation, resource consumption in large scale and accumulation of numerical error. Software aging degrads the performance that may result in system failure. This happens because of premature resource exhaustion. This issue cannot be determined during software testing phase because of the dynamic nature of operation. The errors that cause software aging are of special types. These errors do not disturb the software functionality but target the response time and its environment. This issue is to be resolved only during run time as it occurs because of the dynamic nature of the problem. To alleviate the impact of software aging, software rejuvenation technique is being used. Rejuvenation process reboots the system or re-initiates the softwares. This avoids faults or failure. Software rejuvenation removes accumulated error conditions, frees up deadlocks and defragments operating system resources like memory. Hence, it avoids future failures of system that may happen due to software aging. As service availability is crucial, software rejuvenation is to be carried out at defined schedules without disrupting the service. The presence of Software rejuvenation techniques can make software systems more trustworthy. Software designers are using this concept to improve the quality and reliability of the software. Software aging and rejuvenation has generated a lot of research interest in recent years. This work reviews some of the research works related to detection of software aging and identifies research gaps

    Fault Tolerant Approaches through Scheduling in Cloud Computing Environment - A State of Art

    Get PDF
    Based on pay-as-per-usage policy, there is a tremendous use of cloud computing in scientific society like bio-medical, healthcare and online financial applications. Fault tolerance is one of the biggest challenges to guarantee the reliability and availability of critical services. We must make the system to avail by minimizing the impact of failure. In this paper, we conducted a comparative analysis of various approaches for tolerating faults through scheduling in cloud computing environment based on their policies. The goal of this paper is not only used to analyze the existing methods, but also to identify the areas needed for future research

    Towards UAV-based MEC service chain resilience evaluation: a quantitative modeling approach

    Get PDF
    Unmanned aerial vehicle (UAV) and network function virtualization (NFV) facilitate the deployment of multi-access edge computing (MEC). In the UAV-based MEC (UMEC) network, virtualized network function (VNF) can be implemented as a lightweight container running on UMEC host operating system (OS). However, UMEC network is vulnerable to attack, which can result in resource degradation and even UMEC service disruption. Rejuvenation techniques, such as failover technique and live container migration technique, can mitigate the impact of resource degradation but their effectiveness to improve the resilience of UMEC services should be evaluated. This paper presents a quantitative modeling approach based on semi-Markov process to investigate the resilience of a UMEC service chain consisting of any number of VNFs executed in any number of UMEC hosts in terms of availability and reliability. Unlike existing studies, the semi-Markov model constructed in this paper can capture the time-dependent behaviors between VNFs, between host OSes, and between VNFs and host OSes on the condition that the holding times of the recovery and failure events follow any kind of distribution. We perform the sensitivity analysis to identify potential resilience bottlenecks. The results highlight that migration time is the parameter significantly affecting the resilience, which shed the insight on designing the UMEC service chain with high-grade resilience requirements. In addition, we carry out the numerical experiments to reveal that: (i) the type of failure time distribution has a significant effect on the resilience; and (ii) the resilience increases with decreasing number of VNFs, while the availability increases with increasing number of UMEC hosts and the reliability decreases with increasing number of UMEC hosts, which can provide meaningful guidance for the UAV placement optimization in the UMEC network

    Envelhecimento e rejuvenescimento de software: 20 anos (19952014) - panorama e desafios

    Get PDF
    Although software aging and rejuvenation is a young research held, in its first 20 years a lot of knowledge has been produced. Nowadays, important scientific journals and conferences include SAR-related topics in their scope of interest. This fast growing and wide range of dissemination venues pose a challenge to researchers to keep tracking of the new findings and trends in this area. In this work, we collected and analyzed SAR research data to detect trends, patterns, and thematic gaps, in order to provide a comprehensive view of this research held over its hrst 20 years. Adopted the systematic mapping approach to answer research questions such as: How the main topics investigated in SAR have evolved over time? Which are the most investigated aging effects? Which rejuvenation techniques and strategies are more frequently used?CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorDissertação (Mestrado)Embora o envelhecimento e rejuvenescimento de software seja um campo de pesquisa novo, em seus primeiros 20 anos muito conhecimento foi produzido. Hoje em dia, revistas e conferências científicas importantes incluem temas relacionados a SAR no seu âmbito de interesse. Este crescimento rápido e a grande variedade de locais de disseminação representam um desafio para os pesquisadores para manter o acompanhamento das novas descobertas e tendências nesta área. Neste trabalho, foram coletados e analisados dados de pesquisa em SAR para detectar tendências, padrões e lacunas temáticas, a hm de proporcionar uma visão abrangente deste campo de pesquisa em seus primeiros 20 anos. Adotou-se a abordagem de mapeamento sistemático para responder a perguntas de pesquisa, tais como: Como os principais temas investigados em SAR têm evoluído ao longo do tempo? Quais são os efeitos do envelhecimento mais investigados? Quais técnicas e estratégias de rejuvenescimento são mais frequentemente usadas

    Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

    Get PDF
    Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF

    Effective Scheduling of Grid Resources Using Failure Prediction

    Get PDF
    In large-scale grid environments, accurate failure prediction is critical to achieve effective resource allocation while assuring specified QoS levels, such as reliability. Traditional methods, such as statistical estimation techniques, can be considered to predict the reliability of resources. However, naive statistical methods often ignore critical characteristic behavior of the resources. In particular, periodic behaviors of grid resources are not captured well by statistical methods. In this paper, we present an alternative mechanism for failure prediction. In our approach, the periodic pattern of resource failures are determined and actively exploited for resource allocation with better QoS guarantees. The proposed scheme is evaluated under a realistic simulation environment of computational grids. The availability of computing resources are simulated according to real trace that was collected from our large-scale monitoring experiment on campus computers. Our evaluation results show that the proposed approach enables significantly higher resource scheduling effectiveness under a variety of workloads compared to baseline approaches

    Improved self-management of datacenter systems applying machine learning

    Get PDF
    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability

    Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

    Get PDF
    Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF

    Autonomous migration of vertual machines for maximizing resource utilization

    Get PDF
    Virtualization of computing resources enables multiple virtual machines to run on a physical machine. When many virtual machines are deployed on a cluster of PCs, some physical machines will inevitably experience overload while others are under-utilized over time due to varying computational demands. This computational imbalance across the cluster undermines the very purpose of maximizing resource utilization through virtualization. To solve this imbalance problem, virtual machine migration has been introduced, where a virtual machine on a heavily loaded physical machine is selected and moved to a lightly loaded physical machine. The selection of the source virtual machine and the destination physical machine is based on a single fixed threshold value. Key to such threshold-based VM migration is to determine when to move which VM to what physical machine, since wrong or inadequate decisions can cause unnecessary migrations that would adversely affect the overall performance. The fixed threshold may not necessarily work for different computing infrastructures. Finding the optimal threshold is critical. In this research, a virtual machine migration framework is presented that autonomously finds and adjusts variable thresholds at runtime for different computing requirements to improve and maximize the utilization of computing resources. Central to this approach is the previous history of migrations and their effects before and after each migration in terms of standard deviation of utilization. To broaden this research, a proactive learning methodology is introduced that not only accumulates the past history of computing patterns and resulting migration decisions but more importantly searches all possibilities for the most suitable decisions. This research demonstrates through experimental results that the learning approach autonomously finds thresholds close to the optimal ones for different computing scenarios and that such varying thresholds yield an optimal number of VM migrations for maximizing resource utilization. The proposed framework is set up on a cluster of 8 and 16 PCs, each of which has multiple User-Mode Linux (UML)-based virtual machines. An extensive set of benchmark programs is deployed to closely resemble a real-world computing environment. Experimental results indicate that the proposed framework indeed autonomously finds thresholds close to the optimal ones for different computing scenarios, balances the load across the cluster through autonomous VM migration, and improves the overall performance of the dynamically changing computing environment
    corecore