361 research outputs found
ASIdE: Using Autocorrelation-Based Size Estimation for Scheduling Bursty Workloads.
Temporal dependence in workloads creates peak congestion that can make service unavailable and reduce system performance. To improve system performability under conditions of temporal dependence, a server should quickly process bursts of requests that may need large service demands. In this paper, we propose and evaluateASIdE, an Autocorrelation-based SIze Estimation, that selectively delays requests which contribute to the workload temporal dependence. ASIdE implicitly approximates the shortest job first (SJF) scheduling policy but without any prior knowledge of job service times. Extensive experiments show that (1) ASIdE achieves good service time estimates from the temporal dependence structure of the workload to implicitly approximate the behavior of SJF; and (2) ASIdE successfully counteracts peak congestion in the workload and improves system performability under a wide variety of settings. Specifically, we show that system capacity under ASIdE is largely increased compared to the first-come first-served (FCFS) scheduling policy and is highly-competitive with SJF. © 2012 IEEE
A performance comparison of the contiguous allocation strategies in 3D mesh connected multicomputers
The performance of contiguous allocation strategies can be significantly affected by the distribution of job execution times. In this paper, the performance of the existing contiguous allocation strategies for 3D mesh multicomputers is re-visited in the context of heavy-tailed distributions (e.g., a Bounded Pareto distribution). The strategies are evaluated and compared using simulation experiments for both First-Come-First-Served (FCFS) and Shortest-Service-Demand (SSD) scheduling strategies under a variety of system loads and system sizes. The results show that the performance of the allocation strategies degrades considerably when job execution times follow a heavy-tailed distribution. Moreover, SSD copes much better than FCFS scheduling strategy in the presence of heavy-tailed job execution times. The results also show that the strategies that depend on a list of allocated sub-meshes for both allocation and deallocation have lower allocation overhead and deliver good system performance in terms of average turnaround time and mean system utilization
Application-centric bandwidth allocation in datacenters
Today's datacenters host a large number of concurrently executing applications with diverse intra-datacenter latency and bandwidth requirements.
Some of these applications, such as data analytics, graph processing, and machine learning training, are data-intensive and require high bandwidth to function properly.
However, these bandwidth-hungry applications can often congest the datacenter network, leading to queuing delays that hurt application completion time.
To remove the network as a potential performance bottleneck, datacenter operators have begun deploying high-end HPC-grade networks like InfiniBand.
These networks offer fully offloaded network stacks, remote direct memory access (RDMA) capability, and non-discarding links, which allow them to provide both low latency and high bandwidth for a single application.
However, it is unclear how well such networks accommodate a mix of latency- and bandwidth-sensitive traffic in a real-world deployment.
In this thesis, we aim to answer the above question.
To do so, we develop RPerf, a latency measurement tool for RDMA-based networks that can precisely measure the InfiniBand switch latency without hardware support.
Using RPerf, we benchmark a rack-scale InfiniBand cluster in both isolated and mixed-traffic scenarios.
Our key finding is that the evaluated switch can provide either low latency or high bandwidth, but not both simultaneously in a mixed-traffic scenario.
We also evaluate several options to improve the latency-bandwidth trade-off and demonstrate that none are ideal.
We find that while queue separation is a solution to protect latency-sensitive applications, it fails to properly manage the bandwidth of other applications.
We also aim to resolve the problem with bandwidth management for non-latency-sensitive applications.
Previous efforts to address this problem have generally focused on achieving max-min fairness at the flow level.
However, we observe that different workloads exhibit varying levels of sensitivity to network bandwidth.
For some workloads, even a small reduction in available bandwidth can significantly increase completion time, while for others, completion time is largely insensitive to available network bandwidth.
As a result, simply splitting the bandwidth equally among all workloads is sub-optimal for overall application-level performance.
To address this issue, we first propose a robust methodology capable of effectively measuring the sensitivity of applications to bandwidth.
We then design Saba, an application-aware bandwidth allocation framework that distributes network bandwidth based on application-level sensitivity.
Saba combines ahead-of-time application profiling to determine bandwidth sensitivity with runtime bandwidth allocation using lightweight software support, with no modifications to network hardware or protocols.
Experiments with a 32-server hardware testbed show that Saba can significantly increase overall performance by reducing the job completion time for bandwidth-sensitive jobs
Autonomic Provisioning with Self-Adaptive Neural Fuzzy Control for End-to-end Delay Guarantee
Abstract—Autonomic server provisioning for performance as-surance is a critical issue in data centers. It is important but challenging to guarantee an important performance metric, percentile-based end-to-end delay of requests flowing through a virtualized multi-tier server cluster. It is mainly due to dynamically varying workload and the lack of an accurate system performance model. In this paper, we propose a novel autonomic server allocation approach based on a model-independent and self-adaptive neural fuzzy control. There are model-independent fuzzy controllers that utilize heuristic knowledge in the form of rule base for performance assurance. Those controllers are designed manually on trial and error basis, often not effective in the face of highly dynamic workloads. We design the neural fuzzy controller as a hybrid of control theoretical and machine learning techniques. It is capable of self-constructing its structure and adapting its parameters through fast online learning. Unlike other supervised machine learning techniques, it does not require off-line training. We further enhance the neural fuzzy controller to compensate for the effect of server switching delays. Extensive simulations demonstrate the effectiveness of our new approach in achieving the percentile-based end-to-end delay guarantees. Com-pared to a rule-based fuzzy controller enabled server allocation approach, the new approach delivers superior performance in the face of highly dynamic workloads. It is robust to workload variation, change in delay target and server switching delays. I
Dynamic Service Contract Enforcement in Service-Oriented Networks
International audienceIn recent years, Service Oriented Architectures (SOA) have emerged as the main solution for the integration of legacy systems with new technologies in the enterprise world. A service is usually governed by a client service contract (CSC) that specifies, among other requirements, the rate at which a service should be accessed, and limits it to no more than a number of service requests during an observation period. Several approaches, using both static and dynamic, credit-based strategies, have been developed in order to enforce the rate specified in the CSC. Existing approaches have problems related to starvation, approximations used in calculations, and rapid credit consumption under certain conditions. In this paper, we propose and validate DoWSS, a doubly-weighted algorithm for service traffic shaping. We show via simulation that DoWSS possesses several advantages: it eliminates the approximation issues, prevents starvation and contains the rapid credit consumption issue in existing credit-based approaches
Recommended from our members
Performance modelling and analysis of e-commerce systems using class based priority scheduling. An investigation into the development of new class based priority scheduling mechanisms for e-commerce system combining different techniques.
Recently, technological developments have affected most lifestyles, especially with the growth in Internet usage. Internet applications highlight the E-commerce capabilities and applications which are now available everywhere; they receive a great number of users on a 24-7 basis because online services are easy to use, faster and cheaper to acquire. Thus E-commerce web sites have become crucial for companies to increase their revenues. This importance has identified certain effective requirements needed from the performance of these applications. In particular, if the web server is overloaded, poor performance can result, due to either a huge rate of requests being generated which are beyond the server¿s capacity, or due to saturation of the communication links capacity which connects the web server to the network.
Recent researches consider the overload issue and explore different mechanisms for managing the performance of E-commerce applications under overload condition.
This thesis proposes a formal approach in order to investigate the effects of the extreme load and the number of dropped requests on the performance of E-
III
commerce web servers. The proposed approach is based on the class-based priority scheme that classifies E-commerce requests into different classes. Because no single technique can solve all aspects of overload problems, this research combines several techniques including: admission control mechanism, session-based admission control, service differentiation, request scheduling and queuing model-based approach.
Request classification is based on the premise that some requests (e.g. buy) are generally considered more important than others (e.g. browse or search). Moreover, this research considers the extended models from Priority Scheduling Mechanism (PSM). These models add a new parameter, such as a review model or modify the basic PSM to low priority fair model, after the discovery of ineffectiveness with low priority customers or to add new features such as portal models.
The proposed model is formally specified using the ¿ -calculus in early stage of models design and a multi-actor simulation was developed to reflect the target models as accurately as possible and is implemented as a Java-based prototype system.
A formal specification that captures the essential PSM features while keeping the performance model sufficiently simple is presented. Furthermore, the simplicity of the UML bridges the gap between ¿-calculus and Java programming language.
IV
There are many metrics for measuring the performance of E-commerce web servers. This research focuses on the performance of E-commerce web servers that refer to the throughput, utilisation, average response time, dropped requests and arrival rate. A number of experiments are conducted in order to test the performance management of the proposed approaches
Dynamical Modeling of Cloud Applications for Runtime Performance Management
Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases
- …