1,721 research outputs found

    Aggregate matrix-analytic techniques and their applications

    Get PDF
    The complexity of computer systems affects the complexity of modeling techniques that can be used for their performance analysis. In this dissertation, we develop a set of techniques that are based on tractable analytic models and enable efficient performance analysis of computer systems. Our approach is three pronged: first, we propose new techniques to parameterize measurement data with Markovian-based stochastic processes that can be further used as input into queueing systems; second, we propose new methods to efficiently solve complex queueing models; and third, we use the proposed methods to evaluate the performance of clustered Web servers and propose new load balancing policies based on this analysis.;We devise two new techniques for fitting measurement data that exhibit high variability into Phase-type (PH) distributions. These techniques apply known fitting algorithms in a divide-and-conquer fashion. We evaluate the accuracy of our methods from both the statistics and the queueing systems perspective. In addition, we propose a new methodology for fitting measurement data that exhibit long-range dependence into Markovian Arrival Processes (MAPs).;We propose a new methodology, ETAQA, for the exact solution of M/G/1-type processes, (GI/M/1-type processes, and their intersection, i.e., quasi birth-death (QBD) processes. ETAQA computes an aggregate steady state probability distribution and a set of measures of interest. E TAQA is numerically stable and computationally superior to alternative solution methods. Apart from ETAQA, we propose a new methodology for the exact solution of a class of GI/G/1-type processes based on aggregation/decomposition.;Finally, we demonstrate the applicability of the proposed techniques by evaluating load balancing policies in clustered Web servers. We address the high variability in the service process of Web servers by dedicating the servers of a cluster to requests of similar sizes and propose new, content-aware load balancing policies. Detailed analysis shows that the proposed policies achieve high user-perceived performance and, by continuously adapting their scheduling parameters to the current workload characteristics, provide good performance under conditions of transient overload

    Markov Chain Modeling for Multi-Server Clusters

    Get PDF

    Runtime support for load balancing of parallel adaptive and irregular applications

    Get PDF
    Applications critical to today\u27s engineering research often must make use of the increased memory and processing power of a parallel machine. While advances in architecture design are leading to more and more powerful parallel systems, the software tools needed to realize their full potential are in a much less advanced state. In particular, efficient, robust, and high-performance runtime support software is critical in the area of dynamic load balancing. While the load balancing of loosely synchronous codes, such as field solvers, has been studied extensively for the past 15 years, there exists a class of problems, known as asynchronous and highly adaptive , for which the dynamic load balancing problem remains open. as we discuss, characteristics of this class of problems render compile-time or static analysis of little benefit, and complicate the dynamic load balancing task immensely.;We make two contributions to this area of research. The first is the design and development of a runtime software toolkit, known as the Parallel Runtime Environment for Multi-computer Applications, or PREMA, which provides interprocessor communication, a global namespace, a framework for the implementation of customized scheduling policies, and several such policies which are prevalent in the load balancing literature. The PREMA system is designed to support coarse-grained domain decompositions with the goals of portability, flexibility, and maintainability in mind, so that developers will quickly feel comfortable incorporating it into existing codes and developing new codes which make use of its functionality. We demonstrate that the programming model and implementation are efficient and lead to the development of robust and high-performance applications.;Our second contribution is in the area of performance modeling. In order to make the most effective use of the PREMA runtime software, certain parameters governing its execution must be set off-line. Optimal values for these parameters may be determined through repeated executions of the target application; however, this is not always possible, particularly in large-scale environments and long-running applications. We present an analytic model that allows the user to quickly and inexpensively predict application performance and fine-tune applications built on the PREMA platform

    Markovian Workload Characterization for QoS Prediction in the Cloud.

    No full text
    Resource allocation in the cloud is usually driven by performance predictions, such as estimates of the future incoming load to the servers or of the quality-of-service (QoS) offered by applications to end users. In this context, characterizing web workload fluctuations in an accurate way is fundamental to understand how to provision cloud resources under time-varying traffic intensities. In this paper, we investigate the Markovian Arrival Processes (MAP) and the related MAP/MAP/1 queueing model as a tool for performance prediction of servers deployed in the cloud. MAPs are a special class of Markov models used as a compact description of the time-varying characteristics of workloads. In addition, MAPs can fit heavy-tail distributions, that are common in HTTP traffic, and can be easily integrated within analytical queueing models to efficiently predict system performance without simulating. By comparison with trace-driven simulation, we observe that existing techniques for MAP parameterization from HTTP log files often lead to inaccurate performance predictions. We then define a maximum likelihood method for fitting MAP parameters based on data commonly available in Apache log files, and a new technique to cope with batch arrivals, which are notoriously difficult to model accurately. Numerical experiments demonstrate the accuracy of our approach for performance prediction of web systems. © 2011 IEEE

    Internet performance modeling: the state of the art at the turn of the century

    Get PDF
    Seemingly overnight, the Internet has gone from an academic experiment to a worldwide information matrix. Along the way, computer scientists have come to realize that understanding the performance of the Internet is a remarkably challenging and subtle problem. This challenge is all the more important because of the increasingly significant role the Internet has come to play in society. To take stock of the field of Internet performance modeling, the authors organized a workshop at Schloß Dagstuhl. This paper summarizes the results of discussions, both plenary and in small groups, that took place during the four-day workshop. It identifies successes, points to areas where more work is needed, and poses “Grand Challenges” for the performance evaluation community with respect to the Internet

    Effective task assignment strategies for distributed systems under highly variable workloads

    Get PDF
    Heavy-tailed workload distributions are commonly experienced in many areas of distributed computing. Such workloads are highly variable, where a small number of very large tasks make up a large proportion of the workload, making the load very hard to distribute effectively. Traditional task assignment policies are ineffective under these conditions as they were formulated based on the assumption of an exponentially distributed workload. Size-based task assignment policies have been proposed to handle heavy-tailed workloads, but their applications are limited by their static nature and assumption of prior knowledge of a task's service requirement. This thesis analyses existing approaches to load distribution under heavy-tailed workloads, and presents a new generalised task assignment policy that significantly improves performance for many distributed applications, by intelligently addressing the negative effects on performance that highly variable workloads cause. Many problems associated with the modelling and optimisations of systems under highly variable workloads were then addressed by a novel technique that approximated these workloads with simpler mathematical representations, without losing any of their pertinent original properties. Finally, we obtain advance queuing metrics (such as the variance of key measurements like waiting time and slowdown that are difficult to obtain analytically) through rigorous simulation

    Job-Replication Trade-Offs:Performance Analysis of Redundancy Systems

    Get PDF

    Theory of Resource Allocation for Robust Distributed Computing

    Get PDF
    Lately, distributed computing (DC) has emerged in several application scenarios such as grid computing, high-performance and reconfigurable computing, wireless sensor networks, battle management systems, peer-to-peer networks, and donation grids. When DC is performed in these scenarios, the distributed computing system (DCS) supporting the applications not only exhibits heterogeneous computing resources and a significant communication latency, but also becomes highly dynamic due to the communication network as well as the computing servers are affected by a wide class of anomalies that change the topology of the system in a random fashion. These anomalies exhibit spatial and/or temporal correlation when they result, for instance, from wide-area power or network outages These correlated failures may not only inflict a large amount of damage to the system, but they may also induce further failures in other servers as a result of the lack of reliable communication between the components of the DCS. In order to provide a robust DC environment in the presence of component failures, it is key to develop a general framework for accurately modeling the complex dynamics of a DCS. In this dissertation a novel approach has been undertaken for modeling a general class of DCSs and for analytically characterizing the performance and reliability of parallel applications executed on such systems. A general probabilistic model has been constructed by assuming that the random times governing the dynamics of the DCS follow arbitrary probability distributions with heterogeneous parameters. Auxiliary age variables have been introduced in the modeling of a DCS and a hybrid continuous and discrete state-space model the system has been constructed. This hybrid model has enabled the development of an age-dependent stochastic regeneration theory, which, in turn, has been employed to analytically characterize the average execution time, the quality-of-service and the reliability in serving an application. These are three metrics of performance and reliability of practical interest in DC. Analytical approximations as well as mathematical lower and upper bounds for these metrics have also been derived in an attempt to reduce the amount of computational resources demanded by the exact characterizations. In order to systematically assess the reliability of DCSs in the presence of correlated component failures, a novel probabilistic model for spatially correlated failures has been developed. The model, based on graph theory and Markov random fields, captures both geographical and logical correlations induced by the arbitrary topology of the communication network of a DCS. The modeling framework, in conjunction with a general class of dynamic task reallocation (DTR) control policies, has been used to optimize the performance and reliability of applications in the presence of independent as well as spatially correlated anomalies. Theoretical predictions, Monte- Carlo simulations as well as experimental results have shown that optimizing these metrics can significantly impact the performance of a DCS. Moreover, the general setting developed here has shed insights on: (i) the effect of different stochastic mod- els on the accuracy of the performance and reliability metrics, (ii) the dependence of the DTR policies on system parameters such as failure rates and task-processing rates, (iii) the severe impact of correlated failures on the reliability of DCSs, (iv) the dependence of the DTR policies on degree of correlation in the failures, and (v) the fundamental trade-off between minimizing the execution time of an application and maximizing its reliability
    corecore