609 research outputs found
Near-optimal scheduling and decision-making models for reactive and proactive fault tolerance mechanisms
As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, the chance of failure occurrences dramatically increases, resulting in potentially large amounts of lost computing time. Fault Tolerance (FT) mechanisms aim to mitigate the impact of failure occurrences to the running applications. However, the overhead of FT mechanisms increases proportionally to the HPC systems\u27 size. Therefore, challenges arise in handling the expensive overhead of FT mechanisms while minimizing the large amount of lost computing time due to failure occurrences.
In this dissertation, a near-optimal scheduling model is built to determine when to invoke a hybrid checkpoint mechanism, by means of stochastic processes and calculus of variations. The obtained schedule minimizes the waste time caused by checkpoint mechanism and failure occurrences. Generally, the checkpoint/restart mechanisms periodically save application states and load the saved state, upon failure occurrences. Furthermore, to handle various FT mechanisms, an adaptive decision-making model has been developed to determine the best FT strategy to invoke at each decision point. The best mechanism at each decision point is selected among considered FT mechanisms to globally minimize the total waste time for an application execution by means of a dynamic programming approach. In addition, the model is adaptive to deal with changes in failure rate over time
Modeling operating system crash behavior through multifractal analysis, long range dependence and mining of memory usage patterns
Software Aging is a phenomenon where the state of the operating systems degrades over a period of time due to transient errors. These transient errors can result in resource exhaustion and operating system hangups or crashes.;Three different techniques from fractal geometry are studied using the same datasets for operating system crash modeling and prediction. Holder Exponent is an indicator of how chaotic a signal is. M5 Prime is a nominal classification algorithm that allows prediction of a numerical quantity such as time to crash based on current and previous data. Hurst exponent measures the self similarity and long range dependence or memory of a process or data set and has been used to predict river flows and network usage.;For each of these techniques, a thorough investigation was conducted using crash, hangup and nominal operating system monitoring data. All three approaches demonstrated a promising ability to identify software aging and predict upcoming operating system crashes. This thesis describes the experiments, reports the best candidate techniques and identifies the topics for further investigation
Extended Abstracts: PMCCS3: Third International Workshop on Performability Modeling of Computer and Communication Systems
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryThe pages of the front matter that are missing from the PDF were blank
Pull-Type Security Patch Management in Intrusion Tolerant Systems: Modeling and Analysis
In this chapter, we introduce a stochastic framework to evaluate the system availability of an intrusion tolerant system (ITS), where the system undergoes patch management with a periodic vulnerability checking strategy, i.e., pull-type patch management. In particular, a composite stochastic reward net (SRN) is developed to capture the overall system behaviors, including vulnerability discovery, intrusion tolerance, and reactive maintenance operations. Furthermore, two kinds of availability criteria, the interval availability and the steady-state availability of the system, are formulated by applying the phase-type (PH) approximation to solve the Markov regenerative process (MRGP) model derived from the composite SRN. Numerical experiments are conducted to investigate the effects of the vulnerability checking interval on the system availability
Mathematics in Software Reliability and Quality Assurance
This monograph concerns the mathematical aspects of software reliability and quality assurance and consists of 11 technical papers in this emerging area. Included are the latest research results related to formal methods and design, automatic software testing, software verification and validation, coalgebra theory, automata theory, hybrid system and software reliability modeling and assessment
Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems
Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF
Envelhecimento e rejuvenescimento de software: 20 anos (19952014) - panorama e desafios
Although software aging and rejuvenation is a young research held, in its first 20 years a lot of knowledge has been produced. Nowadays, important scientific journals and conferences include SAR-related topics in their scope of interest. This fast growing and wide range of dissemination venues pose a challenge to researchers to keep tracking of the new findings and trends in this area. In this work, we collected and analyzed SAR research data to detect trends, patterns, and thematic gaps, in order to provide a comprehensive view of this research held over its hrst 20 years. Adopted the systematic mapping approach to answer research questions such as: How the main topics investigated in SAR have evolved over time? Which are the most investigated aging effects? Which rejuvenation techniques and strategies are more frequently used?CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorDissertação (Mestrado)Embora o envelhecimento e rejuvenescimento de software seja um campo de pesquisa novo, em seus primeiros 20 anos muito conhecimento foi produzido. Hoje em dia, revistas e conferências científicas importantes incluem temas relacionados a SAR no seu âmbito de interesse. Este crescimento rápido e a grande variedade de locais de disseminação representam um desafio para os pesquisadores para manter o acompanhamento das novas descobertas e tendências nesta área. Neste trabalho, foram coletados e analisados dados de pesquisa em SAR para detectar tendências, padrões e lacunas temáticas, a hm de proporcionar uma visão abrangente deste campo de pesquisa em seus primeiros 20 anos. Adotou-se a abordagem de mapeamento sistemático para responder a perguntas de pesquisa, tais como: Como os principais temas investigados em SAR têm evoluído ao longo do tempo? Quais são os efeitos do envelhecimento mais investigados? Quais técnicas e estratégias de rejuvenescimento são mais frequentemente usadas
A Game-Theoretic Approach to Strategic Resource Allocation Mechanisms in Edge and Fog Computing
With the rapid growth of Internet of Things (IoT), cloud-centric application management raises
questions related to quality of service for real-time applications. Fog and edge computing
(FEC) provide a complement to the cloud by filling the gap between cloud and IoT. Resource
management on multiple resources from distributed and administrative FEC nodes is a key
challenge to ensure the quality of end-user’s experience. To improve resource utilisation and
system performance, researchers have been proposed many fair allocation mechanisms for
resource management. Dominant Resource Fairness (DRF), a resource allocation policy for
multiple resource types, meets most of the required fair allocation characteristics. However,
DRF is suitable for centralised resource allocation without considering the effects (or
feedbacks) of large-scale distributed environments like multi-controller software defined
networking (SDN). Nash bargaining from micro-economic theory or competitive equilibrium
equal incomes (CEEI) are well suited to solving dynamic optimisation problems proposing to
‘proportionately’ share resources among distributed participants. Although CEEI’s
decentralised policy guarantees load balancing for performance isolation, they are not faultproof
for computation offloading.
The thesis aims to propose a hybrid and fair allocation mechanism for rejuvenation of
decentralised SDN controller deployment. We apply multi-agent reinforcement learning
(MARL) with robustness against adversarial controllers to enable efficient priority scheduling
for FEC. Motivated by software cybernetics and homeostasis, weighted DRF is generalised by
applying the principles of feedback (positive or/and negative network effects) in reverse game
theory (GT) to design hybrid scheduling schemes for joint multi-resource and multitask
offloading/forwarding in FEC environments.
In the first piece of study, monotonic scheduling for joint offloading at the federated edge is
addressed by proposing truthful mechanism (algorithmic) to neutralise harmful negative and
positive distributive bargain externalities respectively. The IP-DRF scheme is a MARL
approach applying partition form game (PFG) to guarantee second-best Pareto optimality
viii | P a g e
(SBPO) in allocation of multi-resources from deterministic policy in both population and
resource non-monotonicity settings. In the second study, we propose DFog-DRF scheme to
address truthful fog scheduling with bottleneck fairness in fault-probable wireless hierarchical
networks by applying constrained coalition formation (CCF) games to implement MARL. The
multi-objective optimisation problem for fog throughput maximisation is solved via a
constraint dimensionality reduction methodology using fairness constraints for efficient
gateway and low-level controller’s placement.
For evaluation, we develop an agent-based framework to implement fair allocation policies in
distributed data centre environments. In empirical results, the deterministic policy of IP-DRF
scheme provides SBPO and reduces the average execution and turnaround time by 19% and
11.52% as compared to the Nash bargaining or CEEI deterministic policy for 57,445 cloudlets
in population non-monotonic settings. The processing cost of tasks shows significant
improvement (6.89% and 9.03% for fixed and variable pricing) for the resource non-monotonic
setting - using 38,000 cloudlets. The DFog-DRF scheme when benchmarked against asset fair
(MIP) policy shows superior performance (less than 1% in time complexity) for up to 30 FEC
nodes. Furthermore, empirical results using 210 mobiles and 420 applications prove the
efficacy of our hybrid scheduling scheme for hierarchical clustering considering latency and
network usage for throughput maximisation.Abubakar Tafawa Balewa University, Bauchi (Tetfund, Nigeria
Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems
Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF
- …