3 research outputs found

    OVERLAPPED CLUSTERING APPROACH FOR MAXIMIZING THE SERVICE RELIABILITY OF HETEROGENEOUS DISTRIBUTED COMPUTING SYSTEMS

    Get PDF
    ABSTRACT For distributed computing system (DCS) where server nodes can fail permanently with nonzero probability, the reliability of the system can be defined as the probability that the system run the entire tasks successfully assign on it before all the nodes fail. In heterogeneous distributed system where various nodes of the system have different characteristics, reliability of the system is highly dependent on the tasks allocation strategies. So, this paper presents a rigorous framework for efficient tasks allocation in heterogeneous distributed environment, with the goal of maximizing the system reliability. Reliability of the system is characterized in the presence of communication uncertainties and topological changes due to node's failure. Node failure has adverse effects on the system reliability. Thus, one possible way to improve reliability is to make the communication among the tasks as local as possible. For this, an overlapped clustering approach has been used. Further, we calculate the reliability of each node of the DCS to determine the actual capabilities of each node. Here, our purpose is to assign the more costly task to more reliable node of the DCS. Then we utilize the load balancing policies for handling the node's failure effect as well as maximizing the service reliability of the DCS. A numeric example is presented to illustrate the importance of incorporating overlapping cluster and load balancing on the reliability study

    OVERLAPPED CLUSTERING APPROACH FOR MAXIMIZING THE SERVICE RELIABILITY OF HETEROGENEOUS DISTRIBUTED COMPUTING SYSTEMS

    Get PDF
    ABSTRACT For distributed computing system (DCS) where server nodes can fail permanently with nonzero probability, the reliability of the system can be defined as the probability that the system run the entire tasks successfully assign on it before all the nodes fail. In heterogeneous distributed system where various nodes of the system have different characteristics, reliability of the system is highly dependent on the tasks allocation strategies. So, this paper presents a rigorous framework for efficient tasks allocation in heterogeneous distributed environment, with the goal of maximizing the system reliability. Reliability of the system is characterized in the presence of communication uncertainties and topological changes due to node's failure. Node failure has adverse effects on the system reliability. Thus, one possible way to improve reliability is to make the communication among the tasks as local as possible. For this, an overlapped clustering approach has been used. Further, we calculate the reliability of each node of the DCS to determine the actual capabilities of each node. Here, our purpose is to assign the more costly task to more reliable node of the DCS. Then we utilize the load balancing policies for handling the node's failure effect as well as maximizing the service reliability of the DCS. A numeric example is presented to illustrate the importance of incorporating overlapping cluster and load balancing on the reliability study

    Performance and Reliability of Non-Markovian Heterogeneous Distributed Computing Systems

    Get PDF
    Average service time, quality-of-service (QoS), and service reliability associated with heterogeneous parallel and distributed computing systems (DCSs) are analytically characterized in a realistic setting for which tangible, stochastic communication delays are present with nonexponential distributions. The departure from the traditionally assumed exponential distributions for event times, such as task-execution times, communication arrival times and load-transfer delays, gives rise to a non-Markovian dynamical problem for which a novel age dependent, renewal-based distributed queuing model is developed. Numerical examples offered by the model shed light on the operational and system settings for which the Markovian setting, resulting from employing an exponential-distribution assumption on the event times, yields inaccurate predictions. A key benefit of the model is that it offers a rigorous framework for devising optimal dynamic task reallocation (DTR) policies systematically in heterogeneous DCSs by optimally selecting the fraction of the excess loads that need to be exchanged among the servers, thereby controlling the degree of cooperative processing in a DCSs. Key results on performance prediction and optimization of DCSs are validated using Monte-Carlo (MC) simulation as well as experiments on a distributed computing testbed. The scalability, in the number of servers, of the age-dependent model is studied and a linearly scalable analytical approximation is derived
    corecore