8 research outputs found
Web Workload Generation According to the UniLoG Approach
Generating synthetic loads which are suffciently close to reality represents an important and challenging task in performance and quality-of-service (QoS) evaluations of computer networks and distributed systems. Here, the load to be generated represents sequences of requests at a well-defined service interface within a network node. The paper presents a tool (UniLoG.HTTP) which can be used in a flexible manner to generate realistic and representative server and network loads, in terms of access requests to Web servers as well as creation of typical Web traffic within a communication network. The paper describes the architecture of this load generator, the critical design decisions and solution approaches which allowed us to obtain the desired flexibility
Global evaluation of CDNs performance using PlanetLab
Since they were introduced in the market, Content Distribution Networks
(CDNs) have been increasing their importance due to the “instantaneity”
requirements pretended by nowadays web users.
Thanks to the increment in the access speed, especially in the last mile with
technologies such as xDSL, HFC, FTTH, the loading time has been reduced.
However the “instantaneity” those users want could not be obtained without
techniques such as caches and content distribution due to CDNs. These
techniques aim to avoid fetching web objects from origin web server, especially
in “heavy” objects such as multimedia files.
CDN provides not only a clever way of distributing content in a globally, but
also preventing problems such as the “flash crowd events”. This kind of
situation could provoke huge monetary losses because it attacks the bottleneck
introduced by clustering servers to reach scalability.
The CDN leader provider is Akamai, and one of the most important decisions a
CDN should perform is deciding witch of the available servers is the best one a
user could use to be able to fetch a specific web object. This best server
selection employs a technique based on DNS with the objective of mapping the
IP address with the best available server in terms of latency.
The current project presents a global performance of Akamai server selection
technique using tools such as PlanetLab and Httperf. Different tests were done
with the objective of comparing the results of the global distributed users to
identify those areas where Akamai perform in a suitable way. To determinate
this, the results obtained with Akamai were also compared with a non-CDN
distribution web page. Finally a linear correlation between the latencies
measured and the number of hops was identified.Castellà: Desde que fueron introducidas en el mercado las Redes de Distribución de
Contenidos (CDN) ha incrementado su importancia debido a la tendencia de
“instantaneidad” en la carga de las páginas web que actualmente pretenden
los usuarios de Internet.
Gracias al incremento en las velocidades de acceso sobretodo en la última
milla con tecnologías como xDSL, HFC, FTTH, la velocidad de carga de las
páginas webs se ha incrementado. Sin embargo esta “instantaneidad” ha sido
posible gracias a diferentes técnicas como la utilización de caches y
distribución de contenidos vía CDN. Estas técnicas tienen como objetivo evitar
que la carga de los objetos web más “pesados” (como pueden ser los archivos
multimedia) se haga desde el servidor origen.
Las CDN proporcionan no sólo una forma efectiva de distribuir los contenidos
de una manera global sino que también resuelven problemas como los “flash
crowd events” que pueden llegar a ocasionar enormes perdidas monetarias
debido a la inoperatividad que generan en la web origen.
Uno de los proveedores más importantes de CDNs es Akamai y una de las
decisiones más importantes que una CDN debe realizar es seleccionar el
mejor servidor disponible en cierto instante de tiempo, para que un usuario
pueda acceder al objeto web deseado. Para esto se utilizan técnicas basadas
en DNS con el objetivo de “mappear” la dirección IP del servidor que presente
mejor latencia.
Este proyecto presenta una evaluación de performance, sobre la técnica de
selección del mejor servidor que utiliza Akamai. Su comportamiento es
evaluando de manera global gracias a la utilización de herramientas como
PlanetLab y Httperf. En el mismo, se realizan diferentes pruebas que hacen
hincapié en comparar los resultados desde puntos ubicados en diferentes
zonas del planeta para así poder concluir en que zonas Akamai tiene mejor
respuesta. Para ello se compararon los resultados obtenidos con una web que
utiliza la CDN de Akamai con otra que no utiliza distribución de contenidos a
través de CDN. Finalmente se trata de identificar una correlación entre las
respuestas de latencia y cantidad de “hops”
Impact of Location on Content Delivery
Steigende Benutzerzahlen und steigende Internetnutzung sind seit über 15 Jahren verantwortlich für ein exponentielles Wachstum des Internetverkehrs. Darüber hinaus haben neue Applikationen und Anwendungsfälle zu einer Veränderung der Eigenschaften des Verkehrs geführt. Zum Beispiel erlauben soziale Netze dem Benutzer die Veröffentlichung eigener Inhalte. Diese benutzergenerierten Inhalte werden häufig auf beliebten Webseiten wie YouTube, Twitter oder Facebook publiziert. Weitere Beispiele sind die Angebote an interaktiven oder multimedialen Inhalten wie Google Maps oder Fernsehdienste (IPTV). Die Einführung von Peer-to-Peer-Protokollen (P2P) im Jahre 1998 bewirkte einen noch radikaleren Wandel, da sie den direkten Austausch von großen Mengen an Daten erlauben: Die Peers übertragen die Daten ohne einen dazwischenliegenden, oft zentralisierten Server. Allerdings zeigen aktuelle Forschungsarbeiten, dass Internetverkehr wieder von HTTP dominiert wird, zum Großteil auf Kosten von P2P. Dieses Verkehrswachstum erhöht die Anforderungen an die Komponenten aus denen das Internet aufgebaut ist, z.B. Server und Router. Darüber hinaus wird der Großteil des Verkehrs von wenigen, sehr beliebten Diensten erzeugt. Die gewaltige Nachfrage nach solchen beliebten Inhalten kann nicht mehr durch das traditionelle Hostingmodell gedeckt werden, bei dem jeder Inhalt nur auf einem Server verfügbar gemacht wird. Stattdessen müssen Inhalteanbieter ihre Infrastruktur ausweiten, z.B. indem sie sie in großen Datenzentren vervielfältigen, oder indem sie den Dienst einer Content Distribution Infrastructure wie Akamai oder Limelight in Anspruch nehmen. Darüber hinaus müssen nicht nur die Anbieter von Inhalten sich der Nachfrage anpassen: Auch die Netzwerkinfrastruktur muss kontinuierlich mit der ständig steigenden Nachfrage mitwachsen. In dieser Doktorarbeit charakterisieren wir die Auswirkung von Content Delivery auf das Netzwerk. Wir nutzen Datensätze aus aktiven und aus passiven Messungen, die es uns ermöglichen, das Problem auf verschiedenen Abstraktionsebenen zu untersuchen: vom detaillierten Verhalten auf der Protokollebene von verschiedenen Content Delivery-Methoden bis hin zum ganzheitlichen Bild des Identifizierens und Kartographierens der Content Distribution Infrastructures, die für die populärsten Inhalte verantwortlich sind. Unsere Ergebnisse zeigen, dass das Cachen von Inhalten immer noch ein schwieriges Problem darstellt und dass die Wahl des DNS-Resolvers durch den Nutzer einen ausgeprägten Einfluß auf den Serverwahlmechanismus der Content Distribution Infrastructure hat. Wir schlagen vor, Webinhalte zu kartographieren, um darauf rückschließen zu können, wie Content Distribution Infrastructures ausgerollt sind und welche Rollen verschiedene Organisationen im Internet einnehmen. Wir schließen die Arbeit ab, indem wir unsere Ergebnisse mit zeitnahen Arbeiten vergleichen und geben Empfehlungen, wie man die Auslieferung von Inhalten weiter verbessern kann, an alle betroffenen Parteien: Benutzer, Internetdienstanbieter und Content Distribution Infrastructures.The increasing number of users as well as their demand for more and richer content has led to an exponential growth of Internet traffic for more than 15 years. In addition, new applications and use cases have changed the type of traffic. For example, social networking enables users to publish their own content. This user generated content is often published on popular sites such as YouTube, Twitter, and Facebook. Another example are the offerings of interactive and multi-media content by content providers, e.g., Google Maps or IPTV services. With the introduction of peer-to-peer (P2P) protocols in 1998 an even more radical change emerged because P2P protocols allow users to directly exchange large amounts of content: The peers transfer data without the need for an intermediary and often centralized server. However, as shown by recent studies Internet traffic is again dominated by HTTP, mostly at the expense of P2P. This traffic growth increases the demands on the infrastructure components that form the Internet, e.g., servers and routers. Moreover, most of the traffic is generated by a few very popular services. The enormous demand for such popular content cannot be satisfied by the traditional hosting model in which content is located on a single server. Instead, content providers need to scale up their delivery infrastructure, e.g., by using replication in large data centers or by buying service from content delivery infrastructures, e.g., Akamai or Limelight. Moreover, not only content providers have to cope with the demand: The network infrastructure also needs to be constantly upgraded to keep up with the growing demand for content. In this thesis we characterize the impact of content delivery on the network. We utilize data sets from both active and passive measurements. This allows us to cover a wide range of abstraction levels from a detailed protocol level view of several content delivery mechanisms to the high-level picture of identifying and mapping the content infrastructures that are hosting the most popular content. We find that caching content is still hard and that the user's choice of DNS resolvers has a profound impact on the server selection mechanism of content distribution infrastructures. We propose Web content cartography to infer how content distribution infrastructures are deployed and what the role of different organizations in the Internet is. We conclude by putting our findings in the context of contemporary work and give recommendations on how to improve content delivery to all parties involved: users, Internet service providers, and content distribution infrastructures
Optimization inWeb Caching: Cache Management, Capacity Planning, and Content Naming
Caching is fundamental to performance in distributed information retrieval systems
such as the World Wide Web. This thesis introduces novel techniques for optimizing performance
and cost-effectiveness in Web cache hierarchies.
When requests are served by nearby caches rather than distant servers, server loads and
network traffic decrease and transactions are faster. Cache system design and management,
however, face extraordinary challenges in loosely-organized environments like the Web,
where the many components involved in content creation, transport, and consumption are
owned and administered by different entities. Such environments call for decentralized
algorithms in which stakeholders act on local information and private preferences.
In this thesis I consider problems of optimally designing new Web cache hierarchies
and optimizing existing ones. The methods I introduce span the Web from point of content
creation to point of consumption: I quantify the impact of content-naming practices on
cache performance; present techniques for variable-quality-of-service cache management;
describe how a decentralized algorithm can compute economically-optimal cache sizes in
a branching two-level cache hierarchy; and introduce a new protocol extension that eliminates
redundant data transfers and allows “dynamic” content to be cached consistently.
To evaluate several of my new methods, I conducted trace-driven simulations on an
unprecedented scale. This in turn required novel workload measurement methods and efficient
new characterization and simulation techniques. The performance benefits of my proposed
protocol extension are evaluated using two extraordinarily large and detailed workload
traces collected in a traditional corporate network environment and an unconventional
thin-client system.
My empirical research follows a simple but powerful paradigm: measure on a large
scale an important production environment’s exogenous workload; identify performance
bounds inherent in the workload, independent of the system currently serving it; identify
gaps between actual and potential performance in the environment under study; and finally
devise ways to close these gaps through component modifications or through improved
inter-component integration. This approach may be applicable to a wide range of Web
services as they mature.Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/1/kelly-optimization_web_caching.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/2/kelly-optimization_web_caching.ps.bz
Modeling and acceleration of content delivery in world wide web
Ph.DDOCTOR OF PHILOSOPH