15 research outputs found

    Maximizing Service Reliability in Distributed Computing Systems with Random Node Failures: Theory and Implementation

    Get PDF
    In distributed computing systems (DCSs) where server nodes can fail permanently with nonzero probability, the system performance can be assessed by means of the service reliability, defined as the probability of serving all the tasks queued in the DCS before all the nodes fail. This paper presents a rigorous probabilistic framework to analytically characterize the service reliability of a DCS in the presence of communication uncertainties and stochastic topological changes due to node deletions. The framework considers a system composed of heterogeneous nodes with stochastic service and failure times and a communication network imposing random tangible delays. The framework also permits arbitrarily specified, distributed load-balancing actions to be taken by the individual nodes in order to improve the service reliability. The presented analysis is based upon a novel use of the concept of stochastic regeneration, which is exploited to derive a system of difference-differential equations characterizing the service reliability. The theory is further utilized to optimize certain load-balancing policies for maximal service reliability; the optimization is carried out by means of an algorithm that scales linearly with the number of nodes in the system. The analytical model is validated using both Monte Carlo simulations and experimental data collected from a DCS testbed

    Error Resilient Multipath Video Delivery on Wireless Overlay Networks

    Get PDF
    Real time applications delivering multimedia data over wireless networks still pose many challenges due to high throughput and stringent delay requirements. Overlay networks with multipath transmission is the promising solution to address the above problems. But in wireless networks the maintenance of overlay networks induce additional overheads affecting the bulky and delay sensitive delivery of multimedia data. To minimize the overheads, this work introduces the Error Compensated Data Distribution Model (ECDD) that aids in reducing end to end delays and overheads arising from packet retransmissions. The ECDD adopts mTreebone algorithm to identify the unstable wireless nodes and construct overlay tree. The overlay tree is further split to support multipath transmissions. A sub packetization mechanism is adopted for multipath video data delivery in the ECDD. A forward error correction mechanism and sub-packet retransmission techniques adopted in ECDD enables to reduce the overhead and end to end delay. The simulation results presented in this paper prove that the ECDD model proposed achieves lower end to end delay and outperforms the existing models in place. Retransmission requests are minimized by about 52.27% and bit errors are reduced by about 23.93% than Sub-Packet based Multipath Load Distribution

    On Load balancing in distributed systems with large time delays: Theory and experiment

    Get PDF
    In a distributed computing environment with a high communication cost, limiting the number of balancing instants results in a better performance than the case where load balancing is executed continuously. Therefore, finding the optimal number of balancing instants and optimizing the performance over the interbalancing time and over the load-balancing gain becomes an important problem. In this paper we discuss the performance of a previously reported, control-theoretic motivated single load-balancing strategy on a distributed physical system and the performance is compared to our simulation predictions. Based on the concept of regeneration, we also present a mathematical model for the distributed system with two nodes where a one-shot balancing is done. We obtain a system of four difference-differential equations characterizing the mean of the overall completion time. and compare its predictive capabilities via simulation to the physical system

    A Physical Particle and Plane Framework for Load Balancing in Multiprocessors

    Get PDF
    Different models for load balancing have been proposed before, each of which has its own features and advantages when considered for a specific scenario. Yet, nearly all of the existing techniques have assumed an oversimplified model of the system which is often not the case of the real world. In this paper, a new gradient based algorithm for dynamic load balancing on multiprocessors is proposed. This algorithm is an analogy of a classical physical model of a Particle & Plane system which operates based on the classic laws of physics dictated by the nature. 1

    A Hierarchical Load Balancing Strategy Considering Communication Delay Overhead for Large Distributed Computing Systems

    Get PDF
    Load balancing technology can effectively exploit potential enormous compute power available on distributed systems and achieve scalability. Communication delay overhead on distributed system, which is time-varying and is usually ignored or assumed to be deterministic for traditional load balancing strategies, can greatly degrade the load balancing performance. Considering communication delay overhead and its time-varying feature, a hierarchical load balancing strategy based on generalized neural network (HLBSGNN) is presented for large distributed systems. The novelty of the HLBSGNN is threefold: (1) the hierarchy with optimized communication is employed to reduce load balancing overhead for large distributed computing systems, (2) node computation rate and communication delay randomness imposed by the communication medium are considered, and (3) communication and migration overheads are optimized via forecasting delay. Comparisons with traditional strategies, such as centralized, distributed, and random delay strategies, indicate that the HLBSGNN is more effective and efficient

    High-performance cluster computing, algorithms, implementations and performance evaluation for computation-intensive applications to promote complex scientific research on turbulent flows

    Get PDF
    Large-scale high-performance computing is a very rapidly growing field of research that plays a vital role in the advance of science, engineering, and modern industrial technology. Increasing sophistication in research has led to a need for bigger and faster computers or computer clusters, and high-performance computer systems are themselves stimulating the redevelopment of the methods of computation. Computing is fast becoming the most frequently used technique to explore new questions. We have developed high-performance computer simulation modeling software system on turbulent flows. Five papers are selected to present here from dozens of papers published in our efforts on complex software system development and knowledge discovery through computer simulations. The first paper describes the end-to-end computer simulation system development and simulation results that help understand the nature of complex shelterbelt turbulent flows. The second paper deals specifically with high-performance algorithm design and implementation in a cluster of computers. The third paper discusses the twelve design processes of parallel algorithms and software system as well as theoretical performance modeling and characterization of cluster computing. The fourth paper is about the computing framework of drag and pressure coefficients. The fifth paper is about simulated evapotranspiration and energy partition of inhomogeneous ecosystems. We discuss the end-to-end computer simulation system software development, distributed parallel computing performance modeling and system performance characterization. We design and compare several parallel implementations of our computer simulation system and show that the performance depends on algorithm design, communication channel pattern, and coding strategies that significantly impact load balancing, speedup, and computing efficiency. For a given cluster communication characteristics and a given problem complexity, there exists an optimal number of nodes. With this computer simulation system, we resolved many historically controversial issues and a lot of important problems

    An Improved dynamic Load Balancing Algorithm applied to a Cafeteria System in a University Campus

    Get PDF
    Load-balancing algorithms play a key role in improving the performance of practical distributed systems that consist of heterogeneous nodes. The performance of any load-balancing algorithms and its convergence-rate is affected by the structural factors of the network that executes the algorithm. The performance deteriorated as the number of system nodes, the network-diameter, the communication-overhead increased. Moreover, additional technical-factors of the algorithm itself significantly affect the performance of rebalancing the load among nodes. Therefore, this paper proposes an approach that improves the performance of load-balancing algorithms by considering the load-balancing technical-factors and the structure of the network executes the algorithm. We applied the proposed method to a cafeteria system in a university campus and compared our approach with two significant methods presented in the literature. Results indicate that our approach considerably outperformed the original neighborhood approach and the nearest neighbor approach in terms of response time, throughput, communication overhead, and movements cost

    Maximizing service reliability in distributed computing systems with random failures: Theory and implementation,”

    Get PDF
    Abstract-In distributed computing systems (DCSs) where server nodes can fail permanently with nonzero probability, the system performance can be assessed by means of the service reliability, defined as the probability of serving all the tasks queued in the DCS before all the nodes fail. This paper presents a rigorous probabilistic framework to analytically characterize the service reliability of a DCS in the presence of communication uncertainties and stochastic topological changes due to node deletions. The framework considers a system composed of heterogeneous nodes with stochastic service and failure times and a communication network imposing random tangible delays. The framework also permits arbitrarily specified, distributed load-balancing actions to be taken by the individual nodes in order to improve the service reliability. The presented analysis is based upon a novel use of the concept of stochastic regeneration, which is exploited to derive a system of difference-differential equations characterizing the service reliability. The theory is further utilized to optimize certain load-balancing policies for maximal service reliability; the optimization is carried out by means of an algorithm that scales linearly with the number of nodes in the system. The analytical model is validated using both Monte Carlo simulations and experimental data collected from a DCS testbed

    Decentralized load balancing in heterogeneous computational grids

    Get PDF
    With the rapid development of high-speed wide-area networks and powerful yet low-cost computational resources, grid computing has emerged as an attractive computing paradigm. The space limitations of conventional distributed systems can thus be overcome, to fully exploit the resources of under-utilised computing resources in every region around the world for distributed jobs. Workload and resource management are key grid services at the service level of grid software infrastructure, where issues of load balancing represent a common concern for most grid infrastructure developers. Although these are established research areas in parallel and distributed computing, grid computing environments present a number of new challenges, including large-scale computing resources, heterogeneous computing power, the autonomy of organisations hosting the resources, uneven job-arrival pattern among grid sites, considerable job transfer costs, and considerable communication overhead involved in capturing the load information of sites. This dissertation focuses on designing solutions for load balancing in computational grids that can cater for the unique characteristics of grid computing environments. To explore the solution space, we conducted a survey for load balancing solutions, which enabled discussion and comparison of existing approaches, and the delimiting and exploration of the apportion of solution space. A system model was developed to study the load-balancing problems in computational grid environments. In particular, we developed three decentralised algorithms for job dispatching and load balancing—using only partial information: the desirability-aware load balancing algorithm (DA), the performance-driven desirability-aware load-balancing algorithm (P-DA), and the performance-driven region-based load-balancing algorithm (P-RB). All three are scalable, dynamic, decentralised and sender-initiated. We conducted extensive simulation studies to analyse the performance of our load-balancing algorithms. Simulation results showed that the algorithms significantly outperform preexisting decentralised algorithms that are relevant to this research
    corecore