560 research outputs found

    Single system image servers on top of clusters of PCs

    Get PDF

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Scalable, Data- intensive Network Computation

    Get PDF
    To enable groups of collaborating researchers at different locations to effectively share large datasets and investigate their spontaneous hypotheses on the fly, we are interested in de- veloping a distributed system that can be easily leveraged by a variety of data intensive applications. The system is composed of (i) a number of best effort logistical depots to en- able large-scale data sharing and in-network data processing, (ii) a set of end-to-end tools to effectively aggregate, manage and schedule a large number of network computations with attendant data movements, and (iii) a Distributed Hash Table (DHT) on top of the generic depot services for scalable data management. The logistical depot is extended by following the end-to-end principles and is modeled with a closed queuing network model. Its performance characteristics are studied by solving the steady state distributions of the model using local balance equations. The modeling results confirm that the wide area network is the performance bottleneck and running concurrent jobs can increase resource utilization and system throughput. As a novel contribution, techniques to effectively support resource demanding data- intensive applications using the ¯ne-grained depot services are developed. These techniques include instruction level scheduling of operations, dynamic co-scheduling of computation and replication, and adaptive workload control. Experiments in volume visualization have proved the effectiveness of these techniques. Due to the unique characteristic of data- intensive applications and our co-scheduling algorithm, a DHT is implemented on top of the basic storage and computation services. It demonstrates the potential of the Logistical Networking infrastructure to serve as a service creation platform

    Gollach : configuration of a cluster based linux virtual server

    Get PDF
    Includes bibliographical references.This thesis describes the Gollach cluster. The Gollach is an eight machine computing cluster that is aimed at being a general purpose computing resource for research purposes. This includes image processing and simulations. The main quest in this project is to create a cluster server that gives increased computational power and a unified system image (at several levels) without requiring the users to learn specialised tricks. At the same time the cluster must not be tasking to administer

    Reducing Internet Latency : A Survey of Techniques and their Merit

    Get PDF
    Bob Briscoe, Anna Brunstrom, Andreas Petlund, David Hayes, David Ros, Ing-Jyh Tsang, Stein Gjessing, Gorry Fairhurst, Carsten Griwodz, Michael WelzlPeer reviewedPreprin

    Improving Cloud Middlebox Infrastructure for Online Services

    Get PDF
    Middleboxes are an indispensable part of the datacenter networks that provide high availability, scalability and performance to the online services. Using load balancer as an example, this thesis shows that the prevalent scale-out middlebox designs using commodity servers are plagued with three fundamental problems: (1) The server-based layer-4 middleboxes are costly and inflate round-trip-time as much as 2x by processing the packets in software. (2) The middlebox instances cause traffic detouring en route from sources to destinations, which inflates network bandwidth usage by as much as 3.2x and can cause transient congestion. (3) Additionally, existing cloud providers do not support layer-7 middleboxes as a service, and third-party proxy-based layer-7 middlebox design exhibits poor availability as TCP state stored locally on middlebox instances are lost upon instance failure. This thesis examines the root causes of the above problems and proposes new cloud-scale middlebox design principles that systemically address all three problems. First, to address the performance problem, we make a key observation that existing commodity switches have resources available to implement key layer-4 middlebox functionalities such as load balancer, and by processing packets in hardware, switches offer low latency and high capacity benefits, at no additional cost as the switch resources are idle. Motivated by this observation, we propose the design principle of using idle switch resources to accelerate middlebox functionailites. To demonstrate the principle, we developed the complete L4 load balancer design that uses commodity switches for low cost and high performance, and carefully fuses a few software load balancer instances to provide for high availability. Second, to address the high network overhead problem from traffic detouring through middlebox instances, we propose to exploit the principles of locality and flexibility in placing the middlebox instances and servers to handle the traffic closer to the sources and reduce the overall traffic and link utilization in the network. Third, to provide high availability in a layer 7 middleboxes, we propose a novel middlebox design principle of decoupling the TCP state from middlebox instances and storing it in persistent key-value store so that any middlebox instance can seamlessly take over any TCP connection when middlebox instances fail. We demonstrate the effectiveness of the above cloud-scale middlebox design principles using load balancers as an example. Specifically, we have prototyped the three design principles in three cloud-scale load balancers: Duet, Rubik, and Yoda, respectively. Our evaluation using a datacenter testbed and large scale simulations show that Duet lowers the costs by 12x and latency overhead by 1000x, Rubik further lowers the datacenter network traffic overhead by 3x, and Yoda L7 Load balancer-as-a-service is practical; decoupling TCP state from load balancer instances has a negligible

    A Clouded Future: Analysis of Microsoft Windows Azure As a Platform for Hosting E-Science Applications

    Get PDF
    Microsoft Windows Azure is Microsoft\u27s cloud based platform for hosting .NET applications. Azure provides a simple, cost effective method for outsourcing application hosting. Windows Azure has caught the eye of researchers in e-science who require parallel computing infrastructures to process mountains of data. Windows Azure offers the same benefits to e-science as it does to other industries. This paper examines the technology behind Azure and analyzes two case studies of e-science projects built on the Windows Azure platform

    Docker-ryppäiden clusterf -kuormantasaajan suunnittelu ja toteutus

    Get PDF
    Docker uses the Linux container namespace and cgroup primitives to provide isolated application runtime environments, allowing the distribution of self-contained application images that can be run in the form of Docker containers. Container platforms provide the infrastructure needed to deploy horizontally scalable container services into the Cloud, including orchestration, networking, service discovery and load balancing. This thesis studies container networking architectures and existing Cloud load balancer implementations to create a design for a scalable load balancer using the Linux IPVS network-level load balancer implementation. The clusterf load balancer architecture uses a two-level load balancing scheme combining different packet forwarding methods for scalability and compatibility with existing Docker applications. A distributed load balancer control plane is implemented to provide automatic load balancing for Docker containers. The clusterf load balancer is evaluated using a testbed environment, measuring the performance the network-level Linux IPVS implementation. The scalability of the clusterf load balancer is tested using Equal-Cost Multi-Path (ECMP) routing and IPVS connection synchronization The result is that the network-level Linux IPVS load balancer performs significantly better than the application-level HAProxy load balancer in the same configuration. The clusterf design allows for horizontal scaling with connection failover between IPVS load balancers. The current clusterf implementation requires the use of asymmetric routing within a network, such as provided by local Ethernet networks. Extending the clusterf design to support deployment onto existing Cloud infrastructure platforms with different networking implementations would qualify the clusterf load balancer for use in container platforms.Docker tarjoaa omavaraisia sovelluslevykuvia joita voidaan siirtää eri tietokoneille ja tavan suorittaa kyseisiä sovelluslevykuvia Docker konttien muodostamissa eristetyssä sovellusympäristöissä. Konttialustat tarjoavat pilvessä ajettavien skaalautuvien sovelluspalveluiden käyttöönottoon vaadittuja peruspalveluita, kuten orkestrointia, verkkoviestintää, palvelinetsintää sekä kuormantasausta. Tämä työ tutkii konttiverkkojen arkkitehtuuria sekä olemassa olevia pilvikuormantasaajatoteutuksia luodakseen skaalautuvan Linux IPVS-toteutukseen pohjautuvan verkkotason kuormantasaajan toteutusmallin. Tämä clusterf kuormantasaaja käyttää kaksikerroksista kuormantasausmallia joka yhdistää eri kuormantasausmenetelmiä saavuttaakseen sekä skaalautuvuuden että yhteensopivuuden olemassa olevien Docker konttisovelluksien kanssa. Työssä toteutetaan hajautettu ohjauspinta joka tarjoaa automaattisen kuormatasauksen Docker konttisovelluksille. Tämän clusterf kuormantasausmallin toteutusta arvioidaan erillisessä ympäristössä mittaamalla Linux IPVS-toteutuksen tarjoama suorituskyky. Tämän clusterf kuormantasaajan skaalautuvuus arvioidaan käyttäen Equal-Cost Multi-Path (ECMP) reititystä sekä IPVS-yhteystilojen synkronointia. Tutkimustyön tuloksena nähdään että verkkotason IPVS kuormantasaaja suoriutuu huomattavasti paremmin kuin sovellustason HAProxy kuormantasaaja samassa konfiguraatiossa. Tämä clusterf toteutus mahdollistaa myös kuormantasaajan skaalautumisen, sallien yhteyksien siirtämisen kuormantasaajalta toiselle. Nykyinen clusterf toteutus perustuu epäsymmetrisen reitityksen käyttöön, joka onnistuu Ethernet-pohjaisessa paikallisverkoissa. Toteutuksen laajentaminen mahdollistaakseen käyttöönoton erilaisia verkkototeutuksia käyttävien pilvialustojen päällä sallisi clusterf-kuormantasaajan käytön osana yleistä konttialustaa
    corecore