38 research outputs found

    Dependence-driven techniques in system design

    Get PDF
    Burstiness in workloads is often found in multi-tier architectures, storage systems, and communication networks. This feature is extremely important in system design because it can significantly degrade system performance and availability. This dissertation focuses on how to use knowledge of burstiness to develop new techniques and tools for performance prediction, scheduling, and resource allocation under bursty workload conditions.;For multi-tier enterprise systems, burstiness in the service times is catastrophic for performance. Via detailed experimentation, we identify the cause of performance degradation on the persistent bottleneck switch among various servers. This results in an unstable behavior that cannot be captured by existing capacity planning models. In this dissertation, beyond identifying the cause and effects of bottleneck switch in multi-tier systems, we also propose modifications to the classic TPC-W benchmark to emulate bursty arrivals in multi-tier systems.;This dissertation also demonstrates how burstiness can be used to improve system performance. Two dependence-driven scheduling policies, SWAP and ALoC, are developed. These general scheduling policies counteract burstiness in workloads and maintain high availability by delaying selected requests that contribute to burstiness. Extensive experiments show that both SWAP and ALoC achieve good estimates of service times based on the knowledge of burstiness in the service process. as a result, SWAP successfully approximates the shortest job first (SJF) scheduling without requiring a priori information of job service times. ALoC adaptively controls system load by infinitely delaying only a small fraction of the incoming requests.;The knowledge of burstiness can also be used to forecast the length of idle intervals in storage systems. In practice, background activities are scheduled during system idle times. The scheduling of background jobs is crucial in terms of the performance degradation of foreground jobs and the utilization of idle times. In this dissertation, new background scheduling schemes are designed to determine when and for how long idle times can be used for serving background jobs, without violating predefined performance targets of foreground jobs. Extensive trace-driven simulation results illustrate that the proposed schemes are effective and robust in a wide range of system conditions. Furthermore, if there is burstiness within idle times, then maintenance features like disk scrubbing and intra-disk data redundancy can be successfully scheduled as background activities during idle times

    Control and inference of structured Markov models

    Get PDF

    Markovian arrivals in stochastic modelling: a survey and some new results

    Get PDF
    This paper aims to provide a comprehensive review on Markovian arrival processes (MAPs), which constitute a rich class of point processes used extensively in stochastic modelling. Our starting point is the versatile process introduced by Neuts (1979) which, under some simplified notation, was coined as the batch Markovian arrival process (BMAP). On the one hand, a general point process can be approximated by appropriate MAPs and, on the other hand, the MAPs provide a versatile, yet tractable option for modelling a bursty flow by preserving the Markovian formalism. While a number of well-known arrival processes are subsumed under a BMAP as special cases, the literature also shows generalizations to model arrival streams with marks, nonhomogeneous settings or even spatial arrivals. We survey on the main aspects of the BMAP, discuss on some of its variants and generalizations, and give a few new results in the context of a recent state-dependent extension.Peer Reviewe

    Workload characterization, modeling, and prediction in grid Computing

    Get PDF
    Workloads play an important role in experimental performance studies of computer systems. This thesis presents a comprehensive characterization of real workloads on production clusters and Grids. A variety of correlation structures and rich scaling behavior are identified in workload attributes such as job arrivals and run times, including pseudo-periodicity, long range dependence, and strong temporal locality. Based on the analytic results workload models are developed to fit the real data. For job arrivals three different kinds of autocorrelations are investigated. For short to middle range dependent data, Markov modulated Poisson processes (MMPP) are good models because they can capture correlations between interarrival times while remaining analytically tractable. For long range dependent and multifractal processes, the multifractal wavelet model (MWM) is able to reconstruct the scaling behavior and it provides a coherent wavelet framework for analysis and synthesis. Pseudo-periodicity is a special kind of autocorrelation and it can be modeled by a matching pursuit approach. For workload attributes such as run time a new model is proposed that can fit not only the marginal distribution but also the second order statistics such as the autocorrelation function (ACF). The development of workload models enable the simulation studies of Grid scheduling strategies. By using the synthetic traces, the performance impacts of workload correlations in Grid scheduling is quantitatively evaluated. The results indicate that autocorrelations in workload attributes can cause performance degradation, in some situations the difference can be up to several orders of magnitude. The larger the autocorrelation, the worse the performance, it is proved both at the cluster and Grid level. This study shows the importance of realistic workload models in performance evaluation studies. Regarding performance predictions, this thesis treats the targeted resources as a ``black box'' and takes a statistical approach. It is shown that statistical learning based methods, after a well-thought and fine-tuned design, are able to deliver good accuracy and performance.UBL - phd migration 201

    Unreliable Retrial Queues in a Random Environment

    Get PDF
    This dissertation investigates stability conditions and approximate steady-state performance measures for unreliable, single-server retrial queues operating in a randomly evolving environment. In such systems, arriving customers that find the server busy or failed join a retrial queue from which they attempt to regain access to the server at random intervals. Such models are useful for the performance evaluation of communications and computer networks which are characterized by time-varying arrival, service and failure rates. To model this time-varying behavior, we study systems whose parameters are modulated by a finite Markov process. Two distinct cases are analyzed. The first considers systems with Markov-modulated arrival, service, retrial, failure and repair rates assuming all interevent and service times are exponentially distributed. The joint process of the orbit size, environment state, and server status is shown to be a tri-layered, level-dependent quasi-birth-and-death (LDQBD) process, and we provide a necessary and sufficient condition for the positive recurrence of LDQBDs using classical techniques. Moreover, we apply efficient numerical algorithms, designed to exploit the matrix-geometric structure of the model, to compute the approximate steady-state orbit size distribution and mean congestion and delay measures. The second case assumes that customers bring generally distributed service requirements while all other processes are identical to the first case. We show that the joint process of orbit size, environment state and server status is a level-dependent, M/G/1-type stochastic process. By employing regenerative theory, and exploiting the M/G/1-type structure, we derive a necessary and sufficient condition for stability of the system. Finally, for the exponential model, we illustrate how the main results may be used to simultaneously select mean time customers spend in orbit, subject to bound and stability constraints

    Performability modelling of homogenous and heterogeneous multiserver systems with breakdowns and repairs

    Get PDF
    This thesis presents analytical modelling of homogeneous multi-server systems with reconfiguration and rebooting delays, heterogeneous multi-server systems with one main and several identical servers, and farm paradigm multi-server systems. This thesis also includes a number of other research works such as, fast performability evaluation models of open networks of nodes with repairs and finite queuing capacities, multi-server systems with deferred repairs, and two stage tandem networks with failures, repairs and multiple servers at the second stage. Applications of these for the popular Beowulf cluster systems and memory servers are also accomplished. Existing techniques used in performance evaluation of multi-server systems are investigated and analysed in detail. Pure performance modelling techniques, pure availability models, and performability models are also considered. First, the existing approaches for pure performance modelling are critically analysed with the discussions on merits and demerits. Then relevant terminology is defined and explained. Since the pure performance models tend to be too optimistic and pure availability models are too conservative, performability models are used for the evaluation of multi-server systems. Fault-tolerant multi-server systems can continue service in case of certain failures. If failure does not occur at a critical point (such as breakdown of the head processor of a farm paradigm system) the system continues serving in a degraded mode of operation. In such systems, reconfiguration and/or rebooting delays are expected while a processor is being mapped out from the system. These delay stages are also taken into account in addition to failures and repairs, in the exact performability models that are developed. Two dimensional Markov state space representations of the systems are used for performability modelling. Following the critical analysis of the existing solution techniques, the Spectral Expansion method is chosen for the solution of the models developed. In this work, open queuing networks are also considered. To evaluate their performability, existing modelling approaches are expanded and validated by simulations, for performability analysis of multistage open networks with finite queuing capacities. The performances of two extended modelling approaches are compared in terms of accuracy for open networks with various queuing capacities. Deferred repair strategies are becoming popular because of the cost reductions they can provide. Effects of using deferred repairs are analysed and performability models are provided for homogeneous multi-server systems and highly available farm paradigm multi-server systems. Since one of the random variables is used to represent the number of jobs in one of the queues, analytical models for performance evaluation of two stage tandem networks suffer because of numerical cumbersomeness. Existing approaches for modelling these systems are actually pure performance models since breakdowns and repairs cannot be considered. One way of modelling these systems can be to divide one of the random variables to present both the operative and non-operative states of the server in one dimension. However, this will give rise to state explosion problem severely limiting the maximum queue capacity that can be handled. In order to overcome this problem a new approach is presented for modelling two stage tandem networks in three dimensions. An approximate solution is presented to solve such a system. This approach manifests itself as a novel contribution for alleviating the state space explosion problem for large and/or complex systems. When two state tandem networks with feedback are modelled using this approach, the operative states can be handled independently and this makes it possible to consider multiple operative states at the second stage. The analytical models presented can be used with various parameters and they are extendible to consider systems with similar architectures. The developed three dimensional approach is capable to handle two stage tandem networks with various characteristics for performability measures. All the approaches presented give accurate results. Numerical solutions are presented for all models developed. In case the solution presented is not exact, simulations are performed to validate the accuracy of the results obtained

    Non-Intrusive Measurement in Packet Networks and its Applications

    Get PDF
    PhDNetwork measurementis becoming increasingly important as a meanst o assesst he performanceo f packet networks. Network performance can involve different aspects such as availability, link failure detection etc, but in this thesis, we will focus on Quality of Service (QoS). Among the metrics used to define QoS, we are particularly interested in end-to-end delay performance. Recently, the adoption of Service Level Agreements (SLA) between network operators and their customersh as becomea major driving force behind QoS measurementm: easurementi s necessaryt o produce evidence of fulfilment of the requirements specified in the SLA. Many attempts to do QoS based packet level measurement have been based on Active Measurement, in which the properties of the end-to-end path are tested by adding testing packets generated from the sending end. The main drawback of active probing is its intrusive nature which causes extraburden on the network, and has been shown to distort the measured condition of the network. The other category of network measurement is known as Passive Measurement. In contrast to Active Measurement, there are no testing packets injected into the network, therefore no intrusion is caused. The proposed applications using Passive Measurement are currently quite limited. But Passive Measurement may offer the potential for an entirely different perspective compared with Active Measurements In this thesis, the objective is to develop a measurement methodology for the end-to-end delay performance based on Passive Measurement. We assume that the nodes in a network domain are accessible.F or example, a network domain operatedb y a single network operator. The novel idea is to estimate the local per-hop delay distribution based on a hybrid approach (model and measurement-based)W. ith this approach,t he storagem easurementd ata requirement can be greatly alleviated and the overhead put in each local node can be minimized, so maintaining the fast switching operation in a local switcher or router. Per-hop delay distributions have been widely used to infer QoS at a single local node. However, the end-to-end delay distribution is more appropriate when quantifying delays across an end-to-end path. Our approach is to capture every local node's delay distribution, and then the end-to-end delay distribution can be obtained by convolving the estimated delay distributions. In this thesis, our algorithm is examined by comparing the proximity of the actual end-to-end delay distribution with the estimated one obtained by our measurement method under various conditions. e. g. in the presence of Markovian or Power-law traffic. Furthermore, the comparison between Active Measurement and our scheme is also studied. 2 Network operators may find our scheme useful when measuring the end-to-end delay performance. As stated earlier, our scheme has no intrusive effect. Furthermore, the measurement result in the local node can be re-usable to deduce other paths' end-to-end delay behaviour as long as this local node is included in the path. Thus our scheme is more scalable compared with active probing

    Spare parts planning and control for maintenance operations

    Get PDF
    This paper presents a framework for planning and control of the spare parts supply chain inorganizations that use and maintain high-value capital assets. Decisions in the framework aredecomposed hierarchically and interfaces are described. We provide relevant literature to aiddecision making and identify open research topics. The framework can be used to increasethe e¿ciency, consistency and sustainability of decisions on how to plan and control a spareparts supply chain. This point is illustrated by applying it in a case-study. Applicability of theframework in di¿erent environments is also investigated
    corecore