69 research outputs found

    An adaptive admission control and load balancing algorithm for a QoS-aware Web system

    Get PDF
    The main objective of this thesis focuses on the design of an adaptive algorithm for admission control and content-aware load balancing for Web traffic. In order to set the context of this work, several reviews are included to introduce the reader in the background concepts of Web load balancing, admission control and the Internet traffic characteristics that may affect the good performance of a Web site. The admission control and load balancing algorithm described in this thesis manages the distribution of traffic to a Web cluster based on QoS requirements. The goal of the proposed scheduling algorithm is to avoid situations in which the system provides a lower performance than desired due to servers' congestion. This is achieved through the implementation of forecasting calculations. Obviously, the increase of the computational cost of the algorithm results in some overhead. This is the reason for designing an adaptive time slot scheduling that sets the execution times of the algorithm depending on the burstiness that is arriving to the system. Therefore, the predictive scheduling algorithm proposed includes an adaptive overhead control. Once defined the scheduling of the algorithm, we design the admission control module based on throughput predictions. The results obtained by several throughput predictors are compared and one of them is selected to be included in our algorithm. The utilisation level that the Web servers will have in the near future is also forecasted and reserved for each service depending on the Service Level Agreement (SLA). Our load balancing strategy is based on a classical policy. Hence, a comparison of several classical load balancing policies is also included in order to know which of them better fits our algorithm. A simulation model has been designed to obtain the results presented in this thesis

    Low Cost - High Performance Streaming Server Architecture Making Use of Multiple Operating Systems

    Get PDF
    University of Tokyo (東京大学

    Improving Cloud Middlebox Infrastructure for Online Services

    Get PDF
    Middleboxes are an indispensable part of the datacenter networks that provide high availability, scalability and performance to the online services. Using load balancer as an example, this thesis shows that the prevalent scale-out middlebox designs using commodity servers are plagued with three fundamental problems: (1) The server-based layer-4 middleboxes are costly and inflate round-trip-time as much as 2x by processing the packets in software. (2) The middlebox instances cause traffic detouring en route from sources to destinations, which inflates network bandwidth usage by as much as 3.2x and can cause transient congestion. (3) Additionally, existing cloud providers do not support layer-7 middleboxes as a service, and third-party proxy-based layer-7 middlebox design exhibits poor availability as TCP state stored locally on middlebox instances are lost upon instance failure. This thesis examines the root causes of the above problems and proposes new cloud-scale middlebox design principles that systemically address all three problems. First, to address the performance problem, we make a key observation that existing commodity switches have resources available to implement key layer-4 middlebox functionalities such as load balancer, and by processing packets in hardware, switches offer low latency and high capacity benefits, at no additional cost as the switch resources are idle. Motivated by this observation, we propose the design principle of using idle switch resources to accelerate middlebox functionailites. To demonstrate the principle, we developed the complete L4 load balancer design that uses commodity switches for low cost and high performance, and carefully fuses a few software load balancer instances to provide for high availability. Second, to address the high network overhead problem from traffic detouring through middlebox instances, we propose to exploit the principles of locality and flexibility in placing the middlebox instances and servers to handle the traffic closer to the sources and reduce the overall traffic and link utilization in the network. Third, to provide high availability in a layer 7 middleboxes, we propose a novel middlebox design principle of decoupling the TCP state from middlebox instances and storing it in persistent key-value store so that any middlebox instance can seamlessly take over any TCP connection when middlebox instances fail. We demonstrate the effectiveness of the above cloud-scale middlebox design principles using load balancers as an example. Specifically, we have prototyped the three design principles in three cloud-scale load balancers: Duet, Rubik, and Yoda, respectively. Our evaluation using a datacenter testbed and large scale simulations show that Duet lowers the costs by 12x and latency overhead by 1000x, Rubik further lowers the datacenter network traffic overhead by 3x, and Yoda L7 Load balancer-as-a-service is practical; decoupling TCP state from load balancer instances has a negligible

    Monitoring and analysis system for performance troubleshooting in data centers

    Get PDF
    It was not long ago. On Christmas Eve 2012, a war of troubleshooting began in Amazon data centers. It started at 12:24 PM, with an mistaken deletion of the state data of Amazon Elastic Load Balancing Service (ELB for short), which was not realized at that time. The mistake first led to a local issue that a small number of ELB service APIs were affected. In about six minutes, it evolved into a critical one that EC2 customers were significantly affected. One example was that Netflix, which was using hundreds of Amazon ELB services, was experiencing an extensive streaming service outage when many customers could not watch TV shows or movies on Christmas Eve. It took Amazon engineers 5 hours 42 minutes to find the root cause, the mistaken deletion, and another 15 hours and 32 minutes to fully recover the ELB service. The war ended at 8:15 AM the next day and brought the performance troubleshooting in data centers to world’s attention. As shown in this Amazon ELB case.Troubleshooting runtime performance issues is crucial in time-sensitive multi-tier cloud services because of their stringent end-to-end timing requirements, but it is also notoriously difficult and time consuming. To address the troubleshooting challenge, this dissertation proposes VScope, a flexible monitoring and analysis system for online troubleshooting in data centers. VScope provides primitive operations which data center operators can use to troubleshoot various performance issues. Each operation is essentially a series of monitoring and analysis functions executed on an overlay network. We design a novel software architecture for VScope so that the overlay networks can be generated, executed and terminated automatically, on-demand. From the troubleshooting side, we design novel anomaly detection algorithms and implement them in VScope. By running anomaly detection algorithms in VScope, data center operators are notified when performance anomalies happen. We also design a graph-based guidance approach, called VFocus, which tracks the interactions among hardware and software components in data centers. VFocus provides primitive operations by which operators can analyze the interactions to find out which components are relevant to the performance issue. VScope’s capabilities and performance are evaluated on a testbed with over 1000 virtual machines (VMs). Experimental results show that the VScope runtime negligibly perturbs system and application performance, and requires mere seconds to deploy monitoring and analytics functions on over 1000 nodes. This demonstrates VScope’s ability to support fast operation and online queries against a comprehensive set of application to system/platform level metrics, and a variety of representative analytics functions. When supporting algorithms with high computation complexity, VScope serves as a ‘thin layer’ that occupies no more than 5% of their total latency. Further, by using VFocus, VScope can locate problematic VMs that cannot be found via solely application-level monitoring, and in one of the use cases explored in the dissertation, it operates with levels of perturbation of over 400% less than what is seen for brute-force and most sampling-based approaches. We also validate VFocus with real-world data center traces. The experimental results show that VFocus has troubleshooting accuracy of 83% on average.Ph.D

    End-to-End Resilience Mechanisms for Network Transport Protocols

    Get PDF
    The universal reliance on and hence the need for resilience in network communications has been well established. Current transport protocols are designed to provide fixed mechanisms for error remediation (if any), using techniques such as ARQ, and offer little or no adaptability to underlying network conditions, or to different sets of application requirements. The ubiquitous TCP transport protocol makes too many assumptions about underlying layers to provide resilient end-to-end service in all network scenarios, especially those which include significant heterogeneity. Additionally the properties of reliability, performability, availability, dependability, and survivability are not explicitly addressed in the design, so there is no support for resilience. This dissertation presents considerations which must be taken in designing new resilience mechanisms for future transport protocols to meet service requirements in the face of various attacks and challenges. The primary mechanisms addressed include diverse end-to-end paths, and multi-mode operation for changing network conditions

    Software Defined Application Delivery Networking

    Get PDF
    In this thesis we present the architecture, design, and prototype implementation details of AppFabric. AppFabric is a next generation application delivery platform for easily creating, managing and controlling massively distributed and very dynamic application deployments that may span multiple datacenters. Over the last few years, the need for more flexibility, finer control, and automatic management of large (and messy) datacenters has stimulated technologies for virtualizing the infrastructure components and placing them under software-based management and control; generically called Software-defined Infrastructure (SDI). However, current applications are not designed to leverage this dynamism and flexibility offered by SDI and they mostly depend on a mix of different techniques including manual configuration, specialized appliances (middleboxes), and (mostly) proprietary middleware solutions together with a team of extremely conscientious and talented system engineers to get their applications deployed and running. AppFabric, 1) automates the whole control and management stack of application deployment and delivery, 2) allows application architects to define logical workflows consisting of application servers, message-level middleboxes, packet-level middleboxes and network services (both, local and wide-area) composed over application-level routing policies, and 3) provides the abstraction of an application cloud that allows the application to dynamically (and automatically) expand and shrink its distributed footprint across multiple geographically distributed datacenters operated by different cloud providers. The architecture consists of a hierarchical control plane system called Lighthouse and a fully distributed data plane design (with no special hardware components such as service orchestrators, load balancers, message brokers, etc.) called OpenADN . The current implementation (under active development) consists of ~10000 lines of python and C code. AppFabric will allow applications to fully leverage the opportunities provided by modern virtualized Software-Defined Infrastructures. It will serve as the platform for deploying massively distributed, and extremely dynamic next generation application use-cases, including: Internet-of-Things/Cyber-Physical Systems: Through support for managing distributed gather-aggregate topologies common to most Internet-of-Things(IoT) and Cyber-Physical Systems(CPS) use-cases. By their very nature, IoT and CPS use cases are massively distributed and have different levels of computation and storage requirements at different locations. Also, they have variable latency requirements for their different distributed sites. Some services, such as device controllers, in an Iot/CPS application workflow may need to gather, process and forward data under near-real time constraints and hence need to be as close to the device as possible. Other services may need more computation to process aggregated data to drive long term business intelligence functions. AppFabric has been designed to provide support for such very dynamic, highly diversified and massively distributed application use-cases. Network Function Virtualization: Through support for heterogeneous workflows, application-aware networking, and network-aware application deployments, AppFabric will enable new partnerships between Application Service Providers (ASPs) and Network Service Providers (NSPs). An application workflow in AppFabric may comprise of application services, packet and message-level middleboxes, and network transport services chained together over an application-level routing substrate. The Application-level routing substrate allows policy-based service chaining where the application may specify policies for routing their application traffic over different services based on application-level content or context. Virtual worlds/multiplayer games: Through support for creating, managing and controlling dynamic and distributed application clouds needed by these applications. AppFabric allows the application to easily specify policies to dynamically grow and shrink the application\u27s footprint over different geographical sites, on-demand. Mobile Apps: Through support for extremely diversified and very dynamic application contexts typical of such applications. Also, AppFabric provides support for automatically managing massively distributed service deployment and controlling application traffic based on application-level policies. This allows mobile applications to provide the best Quality-of-Experience to its users without This thesis is the first to handle and provide a complete solution for such a complex and relevant architectural problem that is expected to touch each of our lives by enabling exciting new application use-cases that are not possible today. Also, AppFabric is a non-proprietary platform that is expected to spawn lots of innovations both in the design of the platform itself and the features it provides to applications. AppFabric still needs many iterations, both in terms of design and implementation maturity. This thesis is not the end of journey for AppFabric but rather just the beginning

    Diseño de los sistemas de comunicación en plantas de exploración y producción de hidrocarburos

    Full text link
    Las plantas industriales de exploración y producción de petróleo y gas disponen de numerosos sistemas de comunicación que permiten el correcto funcionamiento de los procesos que tienen lugar en ella así como la seguridad de la propia planta. Para el presente Proyecto Fin de Carrera se ha llevado a cabo el diseño del sistema de megafonía PAGA (Public Address and General Alarm) y del circuito cerrado de televisión (CCTV) en la unidad de procesos Hydrocrcaker encargada del craqueo de hidrógeno. Partiendo de los requisitos definidos por las especificaciones corporativas de los grupos petroleros para ambos sistemas, PAGA y CCTV, se han expuesto los principios teóricos sobre los que se fundamenta cada uno de ellos y las pautas a seguir para el diseño y demostración del buen funcionamiento a partir de software específico. Se ha empleado las siguientes herramientas software: EASE para la simulación acústica, PSpice para la simulación eléctrica de las etapas de amplificación en la megafonía; y JVSG para el diseño de CCTV. La sonorización tanto de las unidades como del resto de instalaciones interiores ha de garantizar la inteligibilidad de los mensajes transmitidos. La realización de una simulación acústica permite conocer cómo va a ser el comportamiento de la megafonía sin necesidad de instalar el sistema, lo cual es muy útil para este tipo de proyectos cuya ingeniería se realiza previamente a la construcción de la planta. Además se comprueba el correcto diseño de las etapas de amplificación basadas en líneas de alta impedancia o de tensión constante (100 V). El circuito cerrado de televisión (CCTV) garantiza la transmisión de señales visuales de todos los accesos a las instalaciones y unidades de la planta así como la visión en tiempo real del correcto funcionamiento de los procesos químicos llevados a cabo en la refinería. El sistema dispone de puestos de control remoto para el manejo y gestión de las cámaras desplegadas; y de un sistema de almacenamiento de las grabaciones en discos duros (RAID-5) a través de una red SAN (Storage Area Network). Se especifican las diferentes fases de un proyecto de ingeniería en el sector de E&P de hidrocarburos entre las que se destaca: propuesta y adquisición, reunión de arranque (KOM, Kick Off Meeting), estudio in situ (Site Survey), plan de proyecto, diseño y documentación, procedimientos de pruebas, instalación, puesta en marcha y aceptaciones del sistema. Se opta por utilizar terminología inglesa dado al ámbito global del sector. En la última parte del proyecto se presenta un presupuesto aproximado de los materiales empleados en el diseño de PAGA y CCTV. ABSTRACT. Integrated communications for Oil and Gas allows reducing risks, improving productivity, reducing costs, and countering threats to safety and security. Both PAGA system (Public Address and General Alarm) and Closed Circuit Television have been designed for this project in order to ensure a reliable security of an oil refinery. Based on the requirements defined by corporate specifications for both systems (PAGA and CCTV), theoretical principles have been presented as well as the guidelines for the design and demonstration of a reliable design. The following software has been used: EASE for acoustic simulation; PSpice for simulation of the megaphony amplification loops; and JVSG tool for CCTV design. Acoustic for both the units and the other indoor facilities must ensure intelligibility of the transmitted messages. An acoustic simulation allows us to know how will be the performance of the PAGA system without installing loudspeakers, which is very useful for this type of project whose engineering is performed prior to the construction of the plant. Furthermore, it has been verified the correct design of the amplifier stages based on high impedance lines or constant voltage (100 V). Closed circuit television (CCTV) ensures the transmission of visual signals of all access to facilities as well as real-time view of the proper functioning of chemical processes carried out at the refinery. The system has remote control stations for the handling and management of deployed cameras. It is also included a storage system of the recordings on hard drives (RAID - 5) through a SAN (Storage Area Network). Phases of an engineering project in Oil and Gas are defined in the current project. It includes: proposal and acquisition, kick-off meeting (KOM), Site Survey, project plan, design and documentation, testing procedures (SAT and FAT), installation, commissioning and acceptance of the systems. Finally, it has been presented an estimate budget of the materials used in the design of PAGA and CCTV

    Detection and Diagnosis of Memory Leaks in Web Applications

    Get PDF
    Memory leaks -- the existence of unused memory on the heap of applications -- result in low performance and may, in the worst case, cause applications to crash. The migration of application logic to the client side of modern web applications and the use of JavaScript as the main language for client-side development have made memory leaks in JavaScript an issue for web applications. Significant portions of modern web applications are executed on the client browser, with the server acting only as a data store. Client-side web applications communicate with the server asynchronously, remaining on the same web page during their lifetime. Thus, even minor memory leaks can eventually lead to excessive memory usage, negatively affecting user-perceived response time and possibly causing page crashes. This thesis demonstrates the existence of memory leaks in the client side of large and popular web applications, and develops prototype tools to solve this problem. The first approach taken to address memory leaks in web applications is to detect, diagnose, and x them during application development. This approach prevents such leaks from happening by finding and removing their causes. To achieve this goal, this thesis introduces LeakSpot, a tool that creates a runtime heap model of JavaScript applications by modifying web-application code in a browser-agnostic way to record object allocations, accesses, and references created on objects. LeakSpot reports the locations of the code that are allocating leaked objects, i.e., leaky allocation sites. It also identifies accumulation sites, which are the points in the program where references are created on objects but are not removed, e.g., the points where objects are added to a data structure but are not removed. To facilitate debugging and fixing the code, LeakSpot narrows down the space that must be searched for finding the cause of the leaks in two ways: First, it refines the list of leaky allocation sites and reports those allocation sites that are the main cause of the leaks. In addition, for every leaked object, LeakSpot reports all the locations in the program that create a reference to that object. To confirm its usefulness and e fficacy experimentally, LeakSpot is used to find and x memory leaks in JavaScript benchmarks and open-source web applications. In addition, the potential causes of the leaks in large and popular web applications are identified. The performance overhead of LeakSpot in large and popular web applications is also measured, which indirectly demonstrates the scalability of LeakSpot. The second approach taken to address memory leaks assumes memory leaks may still be present after development. This approach aims to reduce the effects of leaked memory during runtime and improve memory efficiency of web applications by removing the leaked objects or early triggering of garbage collection, Using a new tool, MemRed. MemRed automatically detects excessive use of memory during runtime and then takes actions to reduce memory usage. It detects the excessive use of memory by tracking the size of all objects on the heap. If an error is detected, MemRed applies recovery actions to reduce the overall size of the heap and hide the effects of excessive memory usage from users. MemRed is implemented as an extension for the Chrome browser. Evaluation demonstrates the effectiveness of MemRed in reducing memory usage of web applications. In summary, the first tool provided in this thesis, LeakSpot, can be used by developers in finding and fixing memory leaks in JavaScript Applications. Using both tools improves the experience of web-application users.4 month
    corecore