235 research outputs found

    Single system image servers on top of clusters of PCs

    Get PDF

    Content Aware Request Distribution for High Performance Web Service: A Performance Study

    Get PDF
    The World Wide Web is becoming a basic infrastructure for a variety of services, and the increases in audience size and client network bandwidth create service demands that are outpacing server capacity. Web clusters are one solution to this need for high performance, highly available web server systems. We are interested in load distribution techniques, specifically Layer-7 algorithms that are content-aware. Layer-7 algorithms allow distribution control based on the specific content requested, which is advantageous for a system that offers highly heterogenous services. We examine the performance of the Client Aware Policy (CAP) on a Linux/Apache web cluster consisting of a single web switch that directs requests to a pool of dual-processor SMP nodes. We show that the performance advantage of CAP over simple algorithms such as random and round-robin is as high as 29% on our testbed consisting of a mixture of static and dynamic content. Under heavily loaded conditions however, the performance decreases to the level of random distribution. In studying SMP vs. uniprocessor performance using the same number of processors with CAP distribution, we find that SMP dual-processor nodes under moderate workload levels provide equivalent throughput as the same number of CPU’s in a uniprocessor cluster. As workload increases to a heavily loaded state however, the SMP cluster shows reduced throughput compared to a cluster using uniprocessor nodes. We show that the web cluster’s maximum throughput increases linearly with the addition of more nodes to the server pool. We conclude that CAP is advantageous over random or round-robin distribution under certain conditions for highly dynamic workloads, and suggest some future enhancements that may improve its performance

    An adaptive admission control and load balancing algorithm for a QoS-aware Web system

    Get PDF
    The main objective of this thesis focuses on the design of an adaptive algorithm for admission control and content-aware load balancing for Web traffic. In order to set the context of this work, several reviews are included to introduce the reader in the background concepts of Web load balancing, admission control and the Internet traffic characteristics that may affect the good performance of a Web site. The admission control and load balancing algorithm described in this thesis manages the distribution of traffic to a Web cluster based on QoS requirements. The goal of the proposed scheduling algorithm is to avoid situations in which the system provides a lower performance than desired due to servers' congestion. This is achieved through the implementation of forecasting calculations. Obviously, the increase of the computational cost of the algorithm results in some overhead. This is the reason for designing an adaptive time slot scheduling that sets the execution times of the algorithm depending on the burstiness that is arriving to the system. Therefore, the predictive scheduling algorithm proposed includes an adaptive overhead control. Once defined the scheduling of the algorithm, we design the admission control module based on throughput predictions. The results obtained by several throughput predictors are compared and one of them is selected to be included in our algorithm. The utilisation level that the Web servers will have in the near future is also forecasted and reserved for each service depending on the Service Level Agreement (SLA). Our load balancing strategy is based on a classical policy. Hence, a comparison of several classical load balancing policies is also included in order to know which of them better fits our algorithm. A simulation model has been designed to obtain the results presented in this thesis

    A Scalable Cluster-based Infrastructure for Edge-computing Services

    Get PDF
    In this paper we present a scalable and dynamic intermediary infrastruc- ture, SEcS (acronym of BScalable Edge computing Services’’), for developing and deploying advanced Edge computing services, by using a cluster of heterogeneous machines. Our goal is to address the challenges of the next-generation Internet services: scalability, high availability, fault-tolerance and robustness, as well as programmability and quick prototyping. The system is written in Java and is based on IBM’s Web Based Intermediaries (WBI) [71] developed at IBM Almaden Research Center

    Workload Schedulers - Genesis, Algorithms and Comparisons

    Get PDF
    In this article we provide brief descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems, Jobs Schedulers and Big Data Schedulers. We describe their evolution from early adoptions to modern implementations, considering both the use and features of algorithms. In summary, we discuss differences between all presented classes of schedulers and discuss their chronological development. In conclusion, we highlight similarities in the focus of scheduling strategies design, applicable to both local and distributed systems

    Revisiting Actor Programming in C++

    Full text link
    The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. In this paper, we report on designing and building CAF, the "C++ Actor Framework". CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts---in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities---that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPGPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime behaviour for up to 64 cores at very low memory footprint, or in the presence of GPUs. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page

    Improving the Performance of User-level Runtime Systems for Concurrent Applications

    Get PDF
    Concurrency is an essential part of many modern large-scale software systems. Applications must handle millions of simultaneous requests from millions of connected devices. Handling such a large number of concurrent requests requires runtime systems that efficiently man- age concurrency and communication among tasks in an application across multiple cores. Existing low-level programming techniques provide scalable solutions with low overhead, but require non-linear control flow. Alternative approaches to concurrent programming, such as Erlang and Go, support linear control flow by mapping multiple user-level execution entities across multiple kernel threads (M:N threading). However, these systems provide comprehensive execution environments that make it difficult to assess the performance impact of user-level runtimes in isolation. This thesis presents a nimble M:N user-level threading runtime that closes this con- ceptual gap and provides a software infrastructure to precisely study the performance impact of user-level threading. Multiple design alternatives are presented and evaluated for scheduling, I/O multiplexing, and synchronization components of the runtime. The performance of the runtime is evaluated in comparison to event-driven software, system- level threading, and other user-level threading runtimes. An experimental evaluation is conducted using benchmark programs, as well as the popular Memcached application. The user-level runtime supports high levels of concurrency without sacrificing application performance. In addition, the user-level scheduling problem is studied in the context of an existing actor runtime that maps multiple actors to multiple kernel-level threads. In particular, two locality-aware work-stealing schedulers are proposed and evaluated. It is shown that locality-aware scheduling can significantly improve the performance of a class of applications with a high level of concurrency. In general, the performance and resource utilization of large-scale concurrent applications depends on the level of concurrency that can be expressed by the programming model. This fundamental effect is studied by refining and customizing existing concurrency models

    Rethinking the design and implementation of the i/o software stack for high-performance computing

    Get PDF
    Current I/O stack for high-performance computing is composed of multiple software layers in order to hide users from complexity of I/O performance optimization. However, the design and implementation of a specific layer is usually carried out separately with limited consideration of its impact on other layers, which could result in suboptimal I/O performance because data access locality is weakened, if not lost, on hard disk, a widely used storage medium in high-end storage systems. In this dissertation, we experimentally demonstrated such issues in four different layers, including operating system process management layer and MPI-IO middleware layer on compute server side, and parallel file system layer and disk I/O scheduling layer on data server side. This dissertation makes four contributions towards solving each of the issues. First, we propose a data-driven execution model for DualPar to explore opportunity of effective I/O scheduling to alleviate I/O bottleneck via cooperation between the I/O and process schedulers. Its novelty is on the ability to obtain a pool of pre-sorted requests to I/O scheduler in its data-driven execution mode by using process pre-execution and prefetching techniques. Second, realizing that well-formed locality for an MPI program by using collective I/O can be seriously compromised by non-determinism in process scheduling, we proposed Resonant I/O, to match the data request pattern with the pattern of file striping over multiple data servers to improve disk efficiency. Third, since the conventional practice for I/O parallelism using file striping may compromise on-disk data access locality, we proposed IOrchestrator scheduling framework which is implemented in PVFS2 parallel file system to improve I/O performance of multi-node storage systems by orchestrating I/O services among programs when such inter-data-server coordination is dynamically determined to be cost effective. Fourth, we developed iTransformer, a scheme that employs a small SSD to schedule requests for the data on disk. Being less space constrained than with more expensive DRAM, iTransformer can buffer larger amounts of dirty data before writing it back to the disk, or prefetch a larger volume of data in a batch into the SSD. In both cases high disk efficiency can be maintained for highly concurrent requests
    • …
    corecore