609 research outputs found

    Near-optimal scheduling and decision-making models for reactive and proactive fault tolerance mechanisms

    Get PDF
    As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, the chance of failure occurrences dramatically increases, resulting in potentially large amounts of lost computing time. Fault Tolerance (FT) mechanisms aim to mitigate the impact of failure occurrences to the running applications. However, the overhead of FT mechanisms increases proportionally to the HPC systems\u27 size. Therefore, challenges arise in handling the expensive overhead of FT mechanisms while minimizing the large amount of lost computing time due to failure occurrences. In this dissertation, a near-optimal scheduling model is built to determine when to invoke a hybrid checkpoint mechanism, by means of stochastic processes and calculus of variations. The obtained schedule minimizes the waste time caused by checkpoint mechanism and failure occurrences. Generally, the checkpoint/restart mechanisms periodically save application states and load the saved state, upon failure occurrences. Furthermore, to handle various FT mechanisms, an adaptive decision-making model has been developed to determine the best FT strategy to invoke at each decision point. The best mechanism at each decision point is selected among considered FT mechanisms to globally minimize the total waste time for an application execution by means of a dynamic programming approach. In addition, the model is adaptive to deal with changes in failure rate over time

    Modeling operating system crash behavior through multifractal analysis, long range dependence and mining of memory usage patterns

    Get PDF
    Software Aging is a phenomenon where the state of the operating systems degrades over a period of time due to transient errors. These transient errors can result in resource exhaustion and operating system hangups or crashes.;Three different techniques from fractal geometry are studied using the same datasets for operating system crash modeling and prediction. Holder Exponent is an indicator of how chaotic a signal is. M5 Prime is a nominal classification algorithm that allows prediction of a numerical quantity such as time to crash based on current and previous data. Hurst exponent measures the self similarity and long range dependence or memory of a process or data set and has been used to predict river flows and network usage.;For each of these techniques, a thorough investigation was conducted using crash, hangup and nominal operating system monitoring data. All three approaches demonstrated a promising ability to identify software aging and predict upcoming operating system crashes. This thesis describes the experiments, reports the best candidate techniques and identifies the topics for further investigation

    Extended Abstracts: PMCCS3: Third International Workshop on Performability Modeling of Computer and Communication Systems

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryThe pages of the front matter that are missing from the PDF were blank

    Pull-Type Security Patch Management in Intrusion Tolerant Systems: Modeling and Analysis

    Get PDF
    In this chapter, we introduce a stochastic framework to evaluate the system availability of an intrusion tolerant system (ITS), where the system undergoes patch management with a periodic vulnerability checking strategy, i.e., pull-type patch management. In particular, a composite stochastic reward net (SRN) is developed to capture the overall system behaviors, including vulnerability discovery, intrusion tolerance, and reactive maintenance operations. Furthermore, two kinds of availability criteria, the interval availability and the steady-state availability of the system, are formulated by applying the phase-type (PH) approximation to solve the Markov regenerative process (MRGP) model derived from the composite SRN. Numerical experiments are conducted to investigate the effects of the vulnerability checking interval on the system availability

    Mathematics in Software Reliability and Quality Assurance

    Get PDF
    This monograph concerns the mathematical aspects of software reliability and quality assurance and consists of 11 technical papers in this emerging area. Included are the latest research results related to formal methods and design, automatic software testing, software verification and validation, coalgebra theory, automata theory, hybrid system and software reliability modeling and assessment

    Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

    Get PDF
    Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF

    Envelhecimento e rejuvenescimento de software: 20 anos (19952014) - panorama e desafios

    Get PDF
    Although software aging and rejuvenation is a young research held, in its first 20 years a lot of knowledge has been produced. Nowadays, important scientific journals and conferences include SAR-related topics in their scope of interest. This fast growing and wide range of dissemination venues pose a challenge to researchers to keep tracking of the new findings and trends in this area. In this work, we collected and analyzed SAR research data to detect trends, patterns, and thematic gaps, in order to provide a comprehensive view of this research held over its hrst 20 years. Adopted the systematic mapping approach to answer research questions such as: How the main topics investigated in SAR have evolved over time? Which are the most investigated aging effects? Which rejuvenation techniques and strategies are more frequently used?CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorDissertação (Mestrado)Embora o envelhecimento e rejuvenescimento de software seja um campo de pesquisa novo, em seus primeiros 20 anos muito conhecimento foi produzido. Hoje em dia, revistas e conferências científicas importantes incluem temas relacionados a SAR no seu âmbito de interesse. Este crescimento rápido e a grande variedade de locais de disseminação representam um desafio para os pesquisadores para manter o acompanhamento das novas descobertas e tendências nesta área. Neste trabalho, foram coletados e analisados dados de pesquisa em SAR para detectar tendências, padrões e lacunas temáticas, a hm de proporcionar uma visão abrangente deste campo de pesquisa em seus primeiros 20 anos. Adotou-se a abordagem de mapeamento sistemático para responder a perguntas de pesquisa, tais como: Como os principais temas investigados em SAR têm evoluído ao longo do tempo? Quais são os efeitos do envelhecimento mais investigados? Quais técnicas e estratégias de rejuvenescimento são mais frequentemente usadas

    A Game-Theoretic Approach to Strategic Resource Allocation Mechanisms in Edge and Fog Computing

    Get PDF
    With the rapid growth of Internet of Things (IoT), cloud-centric application management raises questions related to quality of service for real-time applications. Fog and edge computing (FEC) provide a complement to the cloud by filling the gap between cloud and IoT. Resource management on multiple resources from distributed and administrative FEC nodes is a key challenge to ensure the quality of end-user’s experience. To improve resource utilisation and system performance, researchers have been proposed many fair allocation mechanisms for resource management. Dominant Resource Fairness (DRF), a resource allocation policy for multiple resource types, meets most of the required fair allocation characteristics. However, DRF is suitable for centralised resource allocation without considering the effects (or feedbacks) of large-scale distributed environments like multi-controller software defined networking (SDN). Nash bargaining from micro-economic theory or competitive equilibrium equal incomes (CEEI) are well suited to solving dynamic optimisation problems proposing to ‘proportionately’ share resources among distributed participants. Although CEEI’s decentralised policy guarantees load balancing for performance isolation, they are not faultproof for computation offloading. The thesis aims to propose a hybrid and fair allocation mechanism for rejuvenation of decentralised SDN controller deployment. We apply multi-agent reinforcement learning (MARL) with robustness against adversarial controllers to enable efficient priority scheduling for FEC. Motivated by software cybernetics and homeostasis, weighted DRF is generalised by applying the principles of feedback (positive or/and negative network effects) in reverse game theory (GT) to design hybrid scheduling schemes for joint multi-resource and multitask offloading/forwarding in FEC environments. In the first piece of study, monotonic scheduling for joint offloading at the federated edge is addressed by proposing truthful mechanism (algorithmic) to neutralise harmful negative and positive distributive bargain externalities respectively. The IP-DRF scheme is a MARL approach applying partition form game (PFG) to guarantee second-best Pareto optimality viii | P a g e (SBPO) in allocation of multi-resources from deterministic policy in both population and resource non-monotonicity settings. In the second study, we propose DFog-DRF scheme to address truthful fog scheduling with bottleneck fairness in fault-probable wireless hierarchical networks by applying constrained coalition formation (CCF) games to implement MARL. The multi-objective optimisation problem for fog throughput maximisation is solved via a constraint dimensionality reduction methodology using fairness constraints for efficient gateway and low-level controller’s placement. For evaluation, we develop an agent-based framework to implement fair allocation policies in distributed data centre environments. In empirical results, the deterministic policy of IP-DRF scheme provides SBPO and reduces the average execution and turnaround time by 19% and 11.52% as compared to the Nash bargaining or CEEI deterministic policy for 57,445 cloudlets in population non-monotonic settings. The processing cost of tasks shows significant improvement (6.89% and 9.03% for fixed and variable pricing) for the resource non-monotonic setting - using 38,000 cloudlets. The DFog-DRF scheme when benchmarked against asset fair (MIP) policy shows superior performance (less than 1% in time complexity) for up to 30 FEC nodes. Furthermore, empirical results using 210 mobiles and 420 applications prove the efficacy of our hybrid scheduling scheme for hierarchical clustering considering latency and network usage for throughput maximisation.Abubakar Tafawa Balewa University, Bauchi (Tetfund, Nigeria

    Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

    Get PDF
    Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF
    corecore