19 research outputs found

    Latency Based Approach for Characterization of Cloud Application Performance.

    Get PDF
    PhDPublic cloud infrastructures provide exible hosting for web application providers, but the rented virtual machines (VMs) often offer unpredictable performance to the deployed applications. Understanding cloud performance is challenging for application providers, as clouds provide limited information that would help them have expectations about their application performance. In this thesis I present a technique to measure the performance of cloud applications, based on observations of the application latency. I treat the cloud application as a black box, making no assumption about the underlying platform. From my measurements, I can observe the varying performance provided by the different VM profles across well-known commercial cloud platforms. I also identify a trade-of between the responsiveness and the load of the measured servers, which can help application providers in their deployment and provisioning

    Dependability where the mobile world meets the enterprise world

    Get PDF
    As we move toward increasingly larger scales of computing, complexity of systems and networks has increased manifold leading to massive failures of cloud providers (Amazon Cloudfront, November 2014) and geographically localized outages of cellular services (T-Mobile, June 2014). In this dissertation, we investigate the dependability aspects of two of the most prevalent computing platforms today, namely, smartphones and cloud computing. These two seemingly disparate platforms are part of a cohesive story—they interact to provide end-to-end services which are increasingly being delivered over mobile platforms, examples being iCloud, Google Drive and their smartphone counterparts iPhone and Android. ^ In one of the early work on characterizing failures in dominant mobile OSes, we analyzed bug repositories of Android and Symbian and found similarities in their failure modes [ISSRE2010]. We also presented a classification of root causes and quantified the impact of ease of customizing the smartphones on system reliability. Our evaluation of Inter-Component Communication in Android [DSN2012] show an alarming number of exception handling errors where a phone may be crashed by passing it malformed component invocation messages, even from unprivileged applications. In this work, we also suggest language extensions that can mitigate these problems. ^ Mobile applications today are increasingly being used to interact with enterprise-class web services commonly hosted in virtualized environments. Virutalization suffers from the problem of imperfect performance isolation where contention for low-level hardware resources can impact application performance. Through a set of rigorous experiments in a private cloud testbed and in EC2, we show that interference induced performance degradation is a reality. Our experiments have also shown that optimal configuration settings for web servers change during such phases of interference. Based on this observation, we design and implement the IC 2engine which can mitigate effects of interference by reconfiguring web server parameters [MW2014]. We further improve IC 2 by incorporating it into a two-level configuration engine, named ICE, for managing web server clusters [ICAC2015]. Our evaluations show that, compared to an interference agnostic configuration, IC 2 can improve response time of web servers by upto 40%, while ICE can improve response time by up to 94% during phases of interference

    Designing and Handling Failure issues in a Structured Overlay Network Based Grid

    Get PDF
    Grid computing is the computing paradigm that is concerned with coordinated resource sharing and problem solving in dynamic, autonomous multi-institutional virtual organizations. Data exchange and service allocation between virtual organizations are challenging problems in the field of Grid computing, due to the decentralization of Grid systems. The resource management in a Grid system ensures efficiency and usability. The required efficiency and usability of Grid systems can be achieved by building a decentralized multi-virtual Grid system. In this thesis we present a decentralized multi-virtual resource management framework in which the system is divided into virtual organizations, each controlled by a broker. An overlay network of brokers is responsible for global resource management and managing the allocation of services. We address two main issues for both local and global resource management: 1) decentralized allocation of tasks to suitable nodes to achieve both local and global load balancing; and 2) handling of both regular and broker failures. Experimental results verify that the system achieves dependable performance with various loads of services and broker failures

    Utilizing Advanced Network Context to Optimize Software-Defined Networks

    Get PDF
    Legacy network systems and protocols are mostly static and keep state information in silo-style storage, thus making state migration, transformation and re-use difficult. Software-Defined Network (SDN) approaches in unison with Network Function Virtualization (NFV) allow for more flexibility, yet they are currently restricted to a limited set of state migration options. Additionally, existing systems and protocols are mostly tailored to meet the requirements of specific application scenarios. As a result, the protocols cannot easily be adapted to novel application demands, organically growing networks, etc. Impeding the sharing of networking and system state, along with lacking support for dynamic transitions between systems and protocols, severely limits the ability to optimally manage resources and dynamically adapt to a desirable overall configuration. These limitations not only affect the network performance but also hinder the deployment of new and innovative protocols as a hard break is usually not feasible and thus full support for legacy systems is required. On the one hand, we propose a generalized way to collect, store, transform, and share context between systems and protocols in both the legacy Internet as well as NFV/SDN-driven networks. This allows us to share state information between multiple systems and protocols from NFs over BGP routers to protocols on all layers of the network stack. On the other hand, we introduce an architecture for designing modular protocols that are built with transition in mind. We argue that the modular design of systems and protocols can remove the key limitations of today’s monolithic protocols and allow for a more dynamic network management. First, we design and implement a Storage and Transformation Engine for Advanced Net- working context (STEAN) which constitutes a shared context storage, making network state information available to other systems and protocols. Its pivotal feature is the ability to allow for state transformation as well as for persisting state to enable future re-use. Second, we provide a Blueprint for Switching Between Mechanisms that serves as a framework and guideline for developers to standardize and ease the process of designing and implementing systems and protocols that support transitions as a first order principle. By means of experimentation, we show that our architecture covers a diverse set of challenging use cases in legacy systems—such as Wireless Multihop Networks (WMNs)—as well as in NFV/SDN-enabled systems. In particular, we demonstrate the feasibility of our approach by migrating state information between two instances of the PRADS NF in a virtualized Mininet environment, and show that our solution outperforms state of the art frameworks that are specifically built for NF migration. We further demonstrate that a dynamic switch between WMN routing protocols is possible at runtime and that the state information can be reutilized for bootstrapping novel protocol modules, thus minimizing the control overhead

    Towards Simulation and Emulation of Large-Scale Computer Networks

    Get PDF
    Developing analytical models that can accurately describe behaviors of Internet-scale networks is difficult. This is due, in part, to the heterogeneous structure, immense size and rapidly changing properties of today\u27s networks. The lack of analytical models makes large-scale network simulation an indispensable tool for studying immense networks. However, large-scale network simulation has not been commonly used to study networks of Internet-scale. This can be attributed to three factors: 1) current large-scale network simulators are geared towards simulation research and not network research, 2) the memory required to execute an Internet-scale model is exorbitant, and 3) large-scale network models are difficult to validate. This dissertation tackles each of these problems. First, this work presents a method for automatically enabling real-time interaction, monitoring, and control of large-scale network models. Network researchers need tools that allow them to focus on creating realistic models and conducting experiments. However, this should not increase the complexity of developing a large-scale network simulator. This work presents a systematic approach to separating the concerns of running large-scale network models on parallel computers and the user facing concerns of configuring and interacting with large-scale network models. Second, this work deals with reducing memory consumption of network models. As network models become larger, so does the amount of memory needed to simulate them. This work presents a comprehensive approach to exploiting structural duplications in network models to dramatically reduce the memory required to execute large-scale network experiments. Lastly, this work addresses the issue of validating large-scale simulations by integrating real protocols and applications into the simulation. With an emulation extension, a network simulator operating in real-time can run together with real-world distributed applications and services. As such, real-time network simulation not only alleviates the burden of developing separate models for applications in simulation, but as real systems are included in the network model, it also increases the confidence level of network simulation. This work presents a scalable and flexible framework to integrate real-world applications with real-time simulation

    QoS-Based Optimization of Runtime Management of Sensing Cloud Applications

    Get PDF
    Die vorliegende Arbeit präsentiert Ansätze und Techniken zur qualitätsbewussten Verbesserung des Laufzeitmanagements von IoT-Anwendungen. IoT-Anwendungen nehmen über die Sensorik von Smart Devices ihre Umgebung wahr, um diese zu analysieren oder mit ihr zu interagieren. Smart Devices sind in der Rechen- und Speicherleistung begrenzt, weshalb viele IoT-Anwendungen über eine IoT Plattform mit elastischen und skalierbaren Cloud Services verbunden sind. Die Last auf dem Cloud Service entsteht durch die verbundenen Smart Devices, die kontinuierlich Nachrichten transferieren. Die Ressourcenkonfiguration des Cloud Services beeinflusst dessen Kapazität. Ein Service Operator, der eine IoT-Anwendung betreibt, ist mit der Herausforderung konfrontiert, die Smart Devices und den Cloud Service so zu konfigurieren, dass eine hohe Datenqualität bei niedrigen Betriebskosten erreicht wird. Um hierbei den Service Operator zur Design Time zu unterstützen, modellieren wir Kostenfunktionen für Datenqualitäten, die durch das Wechselspiel der Smart Device- und Cloud Service-Konfiguration beeinflusst werden. Mit Hilfe dieser Kostenfunktionen kann ein Service Operator nach einer kostenminimalen Konfiguration für bestimmte Szenarien suchen. Existierende Ansätze zur Optimierung von Anwendungen zur Design Time fokussieren sich auf traditionelle Software-Architekturen und bieten daher nicht die notwendigen Konzepte zur Kostenmodellierung von IoT-Anwendungen an. Des Weiteren unterstützen wir den Service Operator durch Lastkontrollverfahren, die auf Kapazitätsengpässe des Cloud Services durch eine kontrollierte Reduktion der Nachrichtenrate reagieren. Während sich das auf die Genauigkeit der Messungen nachteilig auswirken kann, stabilisieren sich zeitliche Verzögerungen und die IoT-Anwendung bleibt auch in starken Überlastszenarien verfügbar. Existierende Laufzeittechniken fokussieren sich auf die automatische Ressourcenprovisionierung von Cloud Services durch Auto-Scaler. Diese ermöglichen zwar, auf Kapazitätsengpässe und Lastschwankungen zu reagieren, doch die erreichte Quality-of-Service (QoS) kann dadurch mit hohen Betriebskosten verbunden sein. Daher ermöglichen wir durch die Lastkontrollverfahren eine weitere Technik, mit der einerseits dynamisch auf Kapazitätsengpässe reagiert werden und andererseits die zur Verfügung stehende Kapazität eines Cloud Services effizient genutzt werden kann. Außerdem präsentieren wir Kopplungstechniken, die Auto-Scaling und Lastkontrollverfahren kombinieren. Bestehende Ansätze zur Rekonfiguration von Smart Devices konzentrieren sich auf Qualitäten wie Genauigkeit oder Energie-Effizienz und sind daher ungeeignet, um auf Kapazitätsengpässe zu reagieren. Zusammenfassend liefert die Dissertation die folgenden Beiträge: 1. Untersuchung von Performance Metriken für Skalierentscheidungen: Wir haben Infrastuktur- und Anwendungsebenen-Metriken daraufhin evaluiert, wie geeignet sie für Skalierentscheidungen von Microservices sind, die variierende Charakteristiken aufweisen. Auf Basis der Ergebnisse kann ein Service Operator eine fundierte Entscheidung darüber treffen, welche Performance Metrik zur Skalierung eines bestimmten Microservices am geeignesten ist. 2. Design von QoS Kostenfunktionen für IoT-Anwendungen: Wir haben ein QoS Kostenmodell aufgestellt, dass das Wirken von Smart Device- und Cloud Service-Konfiguration auf die Qualitäten einer IoT-Anwendung erfasst. Auf Grundlage dieser Kostenmodelle kann die Konfiguration von IoT-Anwendungen zur Design Time optimiert werden. Des Weiteren können mit den Kostenfunktionen Laufzeitverfahren hinsichtlich ihrem Beitrag zur QoS für verschiedene Szenarien evaluiert werden. 3. Entwicklung von Lastkontrollverfahren für IoT-Anwendungen: Die präsentierten Verfahren bieten einen komplementären Mechanismus zu Auto-Scaling an, um bei Kapazitätsengpässen die QoS aufrechtzuerhalten. Hierbei wird die Gesamtlast auf dem Cloud Service durch Anpassungen der Nachrichtenrate der Smart Devices reduziert. Ein Service Operator hat hiermit die Möglichkeit, Kapazitätsengpässen über eine Degradierung der Datenqualität zu begegnen. 4. Kopplung von Lastkontrollverfahren mit Ressourcen-Provisionierung: Wir präsentieren regelbasierte Kopplungsmechanismen, die reaktiv Lastkontrollverfahren oder Auto-Scaler aktivieren und diese damit koppeln. Das ermöglicht, auf Kapazitätsengpässe über eine Kombination von Datenqualitätsreduzierungen und Ressourcekostenerhöhungen zu reagieren. 5. Design eines Frameworks zur Entwicklung selbst-adaptiver Systeme: Das selbst-adaptive Framework bietet ein Anwendungsmodell für IoT-Anwendungen und Konzepte für die Rekonfiguration von Microservices und Smart Devices an. Es kann in verschiedenen Cloud-Umgebungen aufgesetzt werden und beschleunigt die prototypische Entwicklung von Laufzeitverfahren. Wir validierten die Ansätze anhand zweier Case Study Systeme unterschiedlicher Komplexität. Das erste Case Study System besteht aus einem Cloud Service, welcher über eine IoT Plattform Nachrichten von virtuellen Smart Devices verarbeitet. Mit diesem System haben wir für unterschiedliche Anwendungsszenarien die Charakteristiken der vorgestellten Lastkontrollverfahren analysiert, um diese gegen Auto-Scaling und einer Kopplung der Ansätze zu vergleichen. Hierbei stellte sich heraus, dass die Lastkontrollverfahren ähnlich effizient wie Auto-Scaler Überlastszenarien addressieren können und sich die QoS in einem vergleichbaren Bereich bewegt. Im Schnitt erreichten die Lastkontrollverfahren in den untersuchten Szenarien etwa 50 % geringere QoS Gesamtkosten. Es zeigte sich auch, dass sowohl Auto-Scaling als auch die Lastkontrollverfahren in bestimmten Anwendungsszenarien deutliche Nachteile haben, so z. B. wenn die Datengenauigkeit oder Ressourcenkosten im Vordergrund stehen. Es hat sich gezeigt, dass eine Kopplung hierbei immer vorteilhaft ist, um die QoS beizubehalten. Im zweiten Case Study System haben wir eine intelligente Heizungslösung der Robert Bosch GmbH implementiert, um die Ansätze an einem komplexeren System zu validieren. Auch hier zeigte sich, dass eine Kombination von Lastkontrolle und Auto-Scaling am vorteilhaftesten ist und zu einer hohen Datenqualität bei geringen Ressourcenkosten beiträgt. Die Ergebnisse zeigen, dass die vorgestellten Lastkontrollverfahren geeignet sind, die QoS von IoT Anwendungen zu verbessern. Es bietet einem Service Operator damit ein weiteres Werkzeug für das Laufzeitmanagement von IoT Anwendungen, dass einen zum Auto-Scaling komplementären Mechanismus verwendet. Das hier vorgestellte Framework zur Entwicklung selbst-adaptiver IoT Systeme haben wir zur empirischen Beantwortung der Forschungsfragen instanziiert und damit dessen Eignung demonstriert. Wir zeigen außerdem eine exemplarische Verwendung der vorgestellten Kostenfunktionen für verschiedene Anwendungsszenarien und binden diese im Zuge der Validierung in einem Optimierungs-Framework ein

    Improving end-to-end availability using overlay networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2005.Includes bibliographical references (p. 139-150).The end-to-end availability of Internet services is between two and three orders of magnitude worse than other important engineered systems, including the US airline system, the 911 emergency response system, and the US public telephone system. This dissertation explores three systems designed to mask Internet failures, and, through a study of three years of data collected on a 31-site testbed, why these failures happen and how effectively they can be masked. A core aspect of many of the failures that interrupt end-to-end communication is that they fall outside the expected domain of well-behaved network failures. Many traditional techniques cope with link and router failures; as a result, the remaining failures are those caused by software and hardware bugs, misconfiguration, malice, or the inability of current routing systems to cope with persistent congestion.The effects of these failures are exacerbated because Internet services depend upon the proper functioning of many components-wide-area routing, access links, the domain name system, and the servers themselves-and a failure in any of them can prove disastrous to the proper functioning of the service. This dissertation describes three complementary systems to increase Internet availability in the face of such failures. Each system builds upon the idea of an overlay network, a network created dynamically between a group of cooperating Internet hosts. The first two systems, Resilient Overlay Networks (RON) and Multi-homed Overlay Networks (MONET) determine whether the Internet path between two hosts is working on an end-to-end basis. Both systems exploit the considerable redundancy available in the underlying Internet to find failure-disjoint paths between nodes, and forward traffic along a working path. RON is able to avoid 50% of the Internet outages that interrupt communication between a small group of communicating nodes.MONET is more aggressive, combining an overlay network of Web proxies with explicitly engineered redundant links to the Internet to also mask client access link failures. Eighteen months of measurements from a six-site deployment of MONET show that it increases a client's ability to access working Web sites by nearly an order of magnitude. Where RON and MONET combat accidental failures, the Mayday system guards against denial- of-service attacks by surrounding a vulnerable Internet server with a ring of filtering routers. Mayday then uses a set of overlay nodes to act as mediators between the service and its clients, permitting only properly authenticated traffic to reach the server.by David Godbe Andersen.Ph.D
    corecore