24,278 research outputs found

    Maximizing Service Reliability in Distributed Computing Systems with Random Node Failures: Theory and Implementation

    Get PDF
    In distributed computing systems (DCSs) where server nodes can fail permanently with nonzero probability, the system performance can be assessed by means of the service reliability, defined as the probability of serving all the tasks queued in the DCS before all the nodes fail. This paper presents a rigorous probabilistic framework to analytically characterize the service reliability of a DCS in the presence of communication uncertainties and stochastic topological changes due to node deletions. The framework considers a system composed of heterogeneous nodes with stochastic service and failure times and a communication network imposing random tangible delays. The framework also permits arbitrarily specified, distributed load-balancing actions to be taken by the individual nodes in order to improve the service reliability. The presented analysis is based upon a novel use of the concept of stochastic regeneration, which is exploited to derive a system of difference-differential equations characterizing the service reliability. The theory is further utilized to optimize certain load-balancing policies for maximal service reliability; the optimization is carried out by means of an algorithm that scales linearly with the number of nodes in the system. The analytical model is validated using both Monte Carlo simulations and experimental data collected from a DCS testbed

    Evaluating load balancing policies for performance and energy-efficiency

    Get PDF
    Nowadays, more and more increasingly hard computations are performed in challenging fields like weather forecasting, oil and gas exploration, and cryptanalysis. Many of such computations can be implemented using a computer cluster with a large number of servers. Incoming computation requests are then, via a so-called load balancing policy, distributed over the servers to ensure optimal performance. Additionally, being able to switch-off some servers during low period of workload, gives potential to reduced energy consumption. Therefore, load balancing forms, albeit indirectly, a trade-off between performance and energy consumption. In this paper, we introduce a syntax for load-balancing policies to dynamically select a server for each request based on relevant criteria, including the number of jobs queued in servers, power states of servers, and transition delays between power states of servers. To evaluate many policies, we implement two load balancers in: (i) iDSL, a language and tool-chain for evaluating service-oriented systems, and (ii) a simulation framework in AnyLogic. Both implementations are successfully validated by comparison of the results.Comment: In Proceedings QAPL'16, arXiv:1610.0769

    Dynamic resource allocation scheme for distributed heterogeneous computer systems

    Get PDF
    This invention relates to a resource allocation in computer systems, and more particularly, to a method and associated apparatus for shortening response time and improving efficiency of a heterogeneous distributed networked computer system by reallocating the jobs queued up for busy nodes to idle, or less-busy nodes. In accordance with the algorithm (SIDA for short), the load-sharing is initiated by the server device in a manner such that extra overhead in not imposed on the system during heavily-loaded conditions. The algorithm employed in the present invention uses a dual-mode, server-initiated approach. Jobs are transferred from heavily burdened nodes (i.e., over a high threshold limit) to low burdened nodes at the initiation of the receiving node when: (1) a job finishes at a node which is burdened below a pre-established threshold level, or (2) a node is idle for a period of time as established by a wakeup timer at the node. The invention uses a combination of the local queue length and the local service rate ratio at each node as the workload indicator
    corecore