78 research outputs found

    A Holistic Approach to Lowering Latency in Geo-distributed Web Applications

    Get PDF
    User perceived end-to-end latency of web applications have a huge impact on the revenue for many businesses. The end-to-end latency of web applications is impacted by: (i) User to Application server (front-end) latency which includes downloading and parsing web pages, retrieving further objects requested by javascript executions; and (ii) Application and storage server(back-end) latency which includes retrieving meta-data required for an initial rendering, and subsequent content based on user actions. Improving the user-perceived performance of web applications is challenging, given their complex operating environments involving user-facing web servers, content distribution network (CDN) servers, multi-tiered application servers, and storage servers. Further, the application and storage servers are often deployed on multi-tenant cloud platforms that show high performance variability. While many novel approaches like SPDY and geo-replicated datastores have been developed to improve their performance, many of these solutions are specific to certain layers, and may have different impact on user-perceived performance. The primary goal of this thesis is to address the above challenges in a holistic manner, focusing specifically on improving the end-to-end latency of geo-distributed multi-tiered web applications. This thesis makes the following contributions: (i) First, it reduces user-facing latency by helping CDNs identify and map objects that are more critical for page-load latency to the faster CDN cache layers. Through controlled experiments on real-world web pages, we show the potential of our approach to reduce hundreds of milliseconds in latency without affecting overall CDN miss rates. (ii) Next, it reduces back-end latency by optimally adapting the datastore replication policies (including number and location of replicas) to the heterogeneity in workloads. We show the benefits of our replication models using real-world traces of Twitter, Wikipedia and Gowalla on a 8 datacenter Cassandra cluster deployed on EC2. (iii) Finally, it makes multi-tier applications resilient to the inherent performance variability in the cloud through fine-grained request redirection. We highlight the benefits of our approach by deploying three real-world applications on commercial cloud platforms

    Global evaluation of CDNs performance using PlanetLab

    Get PDF
    Since they were introduced in the market, Content Distribution Networks (CDNs) have been increasing their importance due to the “instantaneity” requirements pretended by nowadays web users. Thanks to the increment in the access speed, especially in the last mile with technologies such as xDSL, HFC, FTTH, the loading time has been reduced. However the “instantaneity” those users want could not be obtained without techniques such as caches and content distribution due to CDNs. These techniques aim to avoid fetching web objects from origin web server, especially in “heavy” objects such as multimedia files. CDN provides not only a clever way of distributing content in a globally, but also preventing problems such as the “flash crowd events”. This kind of situation could provoke huge monetary losses because it attacks the bottleneck introduced by clustering servers to reach scalability. The CDN leader provider is Akamai, and one of the most important decisions a CDN should perform is deciding witch of the available servers is the best one a user could use to be able to fetch a specific web object. This best server selection employs a technique based on DNS with the objective of mapping the IP address with the best available server in terms of latency. The current project presents a global performance of Akamai server selection technique using tools such as PlanetLab and Httperf. Different tests were done with the objective of comparing the results of the global distributed users to identify those areas where Akamai perform in a suitable way. To determinate this, the results obtained with Akamai were also compared with a non-CDN distribution web page. Finally a linear correlation between the latencies measured and the number of hops was identified.Castellà: Desde que fueron introducidas en el mercado las Redes de Distribución de Contenidos (CDN) ha incrementado su importancia debido a la tendencia de “instantaneidad” en la carga de las páginas web que actualmente pretenden los usuarios de Internet. Gracias al incremento en las velocidades de acceso sobretodo en la última milla con tecnologías como xDSL, HFC, FTTH, la velocidad de carga de las páginas webs se ha incrementado. Sin embargo esta “instantaneidad” ha sido posible gracias a diferentes técnicas como la utilización de caches y distribución de contenidos vía CDN. Estas técnicas tienen como objetivo evitar que la carga de los objetos web más “pesados” (como pueden ser los archivos multimedia) se haga desde el servidor origen. Las CDN proporcionan no sólo una forma efectiva de distribuir los contenidos de una manera global sino que también resuelven problemas como los “flash crowd events” que pueden llegar a ocasionar enormes perdidas monetarias debido a la inoperatividad que generan en la web origen. Uno de los proveedores más importantes de CDNs es Akamai y una de las decisiones más importantes que una CDN debe realizar es seleccionar el mejor servidor disponible en cierto instante de tiempo, para que un usuario pueda acceder al objeto web deseado. Para esto se utilizan técnicas basadas en DNS con el objetivo de “mappear” la dirección IP del servidor que presente mejor latencia. Este proyecto presenta una evaluación de performance, sobre la técnica de selección del mejor servidor que utiliza Akamai. Su comportamiento es evaluando de manera global gracias a la utilización de herramientas como PlanetLab y Httperf. En el mismo, se realizan diferentes pruebas que hacen hincapié en comparar los resultados desde puntos ubicados en diferentes zonas del planeta para así poder concluir en que zonas Akamai tiene mejor respuesta. Para ello se compararon los resultados obtenidos con una web que utiliza la CDN de Akamai con otra que no utiliza distribución de contenidos a través de CDN. Finalmente se trata de identificar una correlación entre las respuestas de latencia y cantidad de “hops”

    Future of networking is the future of Big Data, The

    Get PDF
    2019 Summer.Includes bibliographical references.Scientific domains such as Climate Science, High Energy Particle Physics (HEP), Genomics, Biology, and many others are increasingly moving towards data-oriented workflows where each of these communities generates, stores and uses massive datasets that reach into terabytes and petabytes, and projected soon to reach exabytes. These communities are also increasingly moving towards a global collaborative model where scientists routinely exchange a significant amount of data. The sheer volume of data and associated complexities associated with maintaining, transferring, and using them, continue to push the limits of the current technologies in multiple dimensions - storage, analysis, networking, and security. This thesis tackles the networking aspect of big-data science. Networking is the glue that binds all the components of modern scientific workflows, and these communities are becoming increasingly dependent on high-speed, highly reliable networks. The network, as the common layer across big-science communities, provides an ideal place for implementing common services. Big-science applications also need to work closely with the network to ensure optimal usage of resources, intelligent routing of requests, and data. Finally, as more communities move towards data-intensive, connected workflows - adopting a service model where the network provides some of the common services reduces not only application complexity but also the necessity of duplicate implementations. Named Data Networking (NDN) is a new network architecture whose service model aligns better with the needs of these data-oriented applications. NDN's name based paradigm makes it easier to provide intelligent features at the network layer rather than at the application layer. This thesis shows that NDN can push several standard features to the network. This work is the first attempt to apply NDN in the context of large scientific data; in the process, this thesis touches upon scientific data naming, name discovery, real-world deployment of NDN for scientific data, feasibility studies, and the designs of in-network protocols for big-data science

    Dynamic content delivery infrastructure deployment using network cloud resources

    Get PDF
    Millionen von Menschen schätzen die Inhalte und Anwendungen, die das Internet zur Verfügung stellt. Um der steigenden Nachfrage an populären Inhalten wie z.B. High-Definition Video oder Online Social Networks nachzukommen, wurden weit verteilte Content Delivery Infrastructures (CDIs) aufgebaut. Damit CDIs im harten Wettbewerbs bestehen können, suchen sie ständig neue Möglichkeiten um laufende Kosten zu senken und Ihre Leistungsfähigkeit zu steigern. Jedoch machen den CDIs eine geringe Agilität bei der Allokation von Servern zu schaffen. Informationen zur Steigerung von Effizienz und Leistungsfähigkeit wie z.B. aktuelle Netzwerkbedingungen und präzise User-Positionen sind den CDIs unbekannt. Obwohl Internet Service Provider (ISPs) diese Informationen besitzen, lassen auch neuere CDI-Architekturen eine mögliche Kollaboration außer Acht. Diese Dissertation untersucht den Einfluss von Kollaboration auf Content Delivery. Zunächst wird das heutige Design- und Betriebsfeld untersucht. Eine Analyse der operativen Daten eines Europäischen Tier-1 ISPs erörtert mögliche Verbesserungen. Erste Ergebnisse zeigen, dass Kollaboration bei der Zuordnung von Usern zu CDI Servern den Netzwerkverkehr lokal begrenzt und die Geschwindigkeit erhöht. Vorhandene Netzwerkpfade eröffnen neue Möglichkeiten der Verkehrssteuerung. Um die Kollaboration zwischen CDIs und ISPs zu ermöglichen, beschreibt diese Arbeit die beiden Key Enabler In-Network Server Allocation und Informed User-Server Assignment. Sie stellt außerdem ein Systemdesign vor, das diese realisiert: NetPaaS (Network Platform as a Service). In-Network Server Allocation nutzt im ISP verteilte Resourcen und aktuelle Virtualisierungstechnologien um eine agile Serverallokation zu ermöglichen. Informed User-Server Assignment erlaubt es ISPs, mögliche Netzwerkengpässe und präzise User-Positionen einzukalkulieren und so CDIs den besten Server für individuelle Nutzer zu empfehlen. Damit bietet NetPaaS einen zusätzlichen Freiheitsgrad zur dynamischen Skalierung von Serverinfrastrukturen. Um das Kollaborationspotential von NetPaaS aufzuzeigen, wird erstmals eine Studie mit operativen Daten des größten kommerziellen CDI und einem Europäischen Tier-1 ISP durchgeführt. Die Ergebniss zeigen, dass eine auf präzisen User-Positionen und aktuellen Netzwerkbedingungen basierende dynamische Serverallokation es dem CDI ermöglicht, besser mit der stark schwankenden Nachfrage nach Inhalten zurecht zu kommen und die Geschwindigkeit der Nutzer zu verbessern. Darüber hinaus führt die Nutzung von NetPaaS zu einer besseren Auslastung vorhandener Serverinfrastrukturen und ermöglicht ein verbessertes Verkehrsmanagement im Netz des ISP. Diese Ergebnisse lassen den Schluss zu, dass NetPaaS die Leistungsfähigkeit und Effizienz von CDIs stark verbessert und unter Umständen laufende Kosten und Investitionen reduziert. NetPaaS verbessert weiterhin das Verkehrsmanagement des ISP und bietet somit eine echte "win-win" Situation fur CDIs und ISPs.Millions of people value the Internet for the content and the applications it makes available. To cope with the increasing end-user demand for popular and often high volume content, e.g., high-definition video or online social networks, massively distributed Content Delivery Infrastructures (CDIs) have been deployed. However, a highly competitive market requires CDIs to constantly investigate new ways to reduce operational costs and improve delivery performance. Today, CDIs mainly suffer from limited agility in server deployment and are largely unaware of network conditions and precise end-user locations, information that improves the efficiency and performance of content delivery. While newly emerging architectures try to address these challenges, none so far considered collaboration, although ISPs have the information readily at hand. In this thesis, we assess the impact of collaboration on content delivery. We first evaluate the design and operating space of todays content delivery landscape and quantify possible benefits of collaboration by analyzing operational traces from an European Tier-1 ISP. We find that collaboration when assigning end-users to servers highly localizes CDI traffic and improves end-user performance. Moreover, we find significant path diversity which enables new mechanisms for traffic management. We propose two key enablers, namely in-network server allocation and informed user-server assignment, to facilitate CDI-ISP collaboration and present our system design, called NetPaaS (Network Platform as a Service), that realizes them. In- network server allocation offers agile server allocation close to the ISPs end-users leveraging virtualization technology and cloud style resources in the network. In- formed user-server assignment enables ISPs to take network bottlenecks and precise end-user locations into account and to recommend the best possible candidate server for individual end-users to CDIs. Therefore, NetPaaS provides an additional degree of freedom to scale-up or shrink the CDI footprint on demand. To quantify the potential of collaboration with NetPaaS, we perform a first-of-its- kind evaluation based on operational traces from the largest commercial CDI and an European Tier-1 ISP. Our findings reveal that dynamic server allocation based on accurate end-user locations and network conditions enables the CDI to better cope with increasing and highly volatile demand for content and improves the end-users performance. Moreover, recommendations from NetPaaS result in better utilization of existing server infrastructure and enables the ISP to better manage traffic flows inside its network. We conclude, that NetPaaS improves the performance and efficiency of content delivery architectures while potentially reducing the required capital investment and operational costs. Moreover, NetPaaS enables the ISP to achieve traffic engineering goals and therefore offers a true win-win situation to both CDIs and ISPs

    Scalable hosting of web applications

    Get PDF
    Modern Web sites have evolved from simple monolithic systems to complex multitiered systems. In contrast to traditional Web sites, these sites do not simply deliver pre-written content but dynamically generate content using (one or more) multi-tiered Web applications. In this thesis, we addressed the question: How to host multi-tiered Web applications in a scalable manner? Scaling up a Web application requires scaling its individual tiers. To this end, various research works have proposed techniques that employ replication or caching solutions at different tiers. However, most of these techniques aim to optimize the performance of individual tiers and not the entire application. A key observation made in our research is that there exists no elixir technique that performs the best for allWeb applications. Effective hosting of a Web application requires careful selection and deployment of several techniques at different tiers. To this end, we present several caching and replication strategies, such as GlobeCBC, GlobeDB and GlobeTP, to improve the scalability of different tiers of a Web application. While these techniques and systems improve the performance of the individual tiers (and eventually the application), an application's administrator is not only interested in the performance of its individual tiers but also in its endto- end performance. To this end, we propose a resource provisioning approach that allows us to choose the best resource configuration for hosting a Web application such that its end-to-end response time can be optimized with minimum usage of resources. The proposed approach is based on an analytical model for multi-tier systems, which allows us to derive expressions for estimating the mean end-to-end response time and its variance.Steen, M.R. van [Promotor]Pierre, G.E.O. [Copromotor

    Latency-driven replication for globally distributed systems

    Get PDF
    Steen, M.R. van [Promotor]Pierre, G.E.O. [Copromotor

    Fast-Flux Botnet Detection Based on Traffic Response and Search Engines Credit Worthiness

    Get PDF
    Botnets are considered as the primary threats on the Internet and there have been many research efforts to detect and mitigate them. Today, Botnet uses a DNS technique fast-flux to hide malware sites behind a constantly changing network of compromised hosts. This technique is similar to trustworthy Round Robin DNS technique and Content Delivery Network (CDN). In order to distinguish the normal network traffic from Botnets different techniques are developed with more or less success. The aim of this paper is to improve Botnet detection using an Intrusion Detection System (IDS) or router. A novel classification method for online Botnet detection based on DNS traffic features that distinguish Botnet from CDN based traffic is presented. Botnet features are classified according to the possibility of usage and implementation in an embedded system. Traffic response is analysed as a strong candidate for online detection. Its disadvantage lies in specific areas where CDN acts as a Botnet. A new feature based on search engine hits is proposed to improve the false positive detection. The experimental evaluations show that proposed classification could significantly improve Botnet detection. A procedure is suggested to implement such a system as a part of IDS

    Inferring Network Usage from Passive Measurements in ISP Networks: Bringing Visibility of the Network to Internet Operators

    Get PDF
    The Internet is evolving with us along the time, nowadays people are more dependent of it, being used for most of the simple activities of their lives. It is not uncommon use the Internet for voice and video communications, social networking, banking and shopping. Current trends in Internet applications such as Web 2.0, cloud computing, and the internet of things are bound to bring higher traffic volume and more heterogeneous traffic. In addition, privacy concerns and network security traits have widely promoted the usage of encryption on the network communications. All these factors make network management an evolving environment that becomes every day more difficult. This thesis focuses on helping to keep track on some of these changes, observing the Internet from an ISP viewpoint and exploring several aspects of the visibility of a network, giving insights on what contents or services are retrieved by customers and how these contents are provided to them. Generally, inferring these information, it is done by means of characterization and analysis of data collected using passive traffic monitoring tools on operative networks. As said, analysis and characterization of traffic collected passively is challenging. Internet end-users are not controlled on the network traffic they generate. Moreover, this traffic in the network might be encrypted or coded in a way that is unfeasible to decode, creating the need for reverse engineering for providing a good picture to the Internet operator. In spite of the challenges, it is presented a characterization of P2P-TV usage of a commercial, proprietary and closed application, that encrypts or encodes its traffic, making quite difficult discerning what is going on by just observing the data carried by the protocol. Then it is presented DN-Hunter, which is an application for rendering visible a great part of the network traffic even when encryption or encoding is available. Finally, it is presented a case study of DNHunter for understanding Amazon Web Services, the most prominent cloud provider that offers computing, storage, and content delivery platforms. In this paper is unveiled the infrastructure, the pervasiveness of content and their traffic allocation policies. Findings reveal that most of the content residing on cloud computing and Internet storage infrastructures is served by one single Amazon datacenter located in Virginia despite it appears to be the worst performing one for Italian users. This causes traffic to take long and expensive paths in the network. Since no automatic migration and load-balancing policies are offered by AWS among different locations, content is exposed to outages, as it is observed in the datasets presented

    Systems and Methods for Measuring and Improving End-User Application Performance on Mobile Devices

    Full text link
    In today's rapidly growing smartphone society, the time users are spending on their smartphones is continuing to grow and mobile applications are becoming the primary medium for providing services and content to users. With such fast paced growth in smart-phone usage, cellular carriers and internet service providers continuously upgrade their infrastructure to the latest technologies and expand their capacities to improve the performance and reliability of their network and to satisfy exploding user demand for mobile data. On the other side of the spectrum, content providers and e-commerce companies adopt the latest protocols and techniques to provide smooth and feature-rich user experiences on their applications. To ensure a good quality of experience, monitoring how applications perform on users' devices is necessary. Often, network and content providers lack such visibility into the end-user application performance. In this dissertation, we demonstrate that having visibility into the end-user perceived performance, through system design for efficient and coordinated active and passive measurements of end-user application and network performance, is crucial for detecting, diagnosing, and addressing performance problems on mobile devices. My dissertation consists of three projects to support this statement. First, to provide such continuous monitoring on smartphones with constrained resources that operate in such a highly dynamic mobile environment, we devise efficient, adaptive, and coordinated systems, as a platform, for active and passive measurements of end-user performance. Second, using this platform and other passive data collection techniques, we conduct an in-depth user trial of mobile multipath to understand how Multipath TCP (MPTCP) performs in practice. Our measurement study reveals several limitations of MPTCP. Based on the insights gained from our measurement study, we propose two different schemes to address the identified limitations of MPTCP. Last, we show how to provide visibility into the end- user application performance for internet providers and in particular home WiFi routers by passively monitoring users' traffic and utilizing per-app models mapping various network quality of service (QoS) metrics to the application performance.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146014/1/ashnik_1.pd
    • …
    corecore