16 research outputs found

    A taxonomy of web prediction algorithms

    Full text link
    Web prefetching techniques are an attractive solution to reduce the user-perceived latency. These techniques are driven by a prediction engine or algorithm that guesses following actions of web users. A large amount of prediction algorithms has been proposed since the first prefetching approach was published, although it is only over the last two or three years when they have begun to be successfully implemented in commercial products. These algorithms can be implemented in any element of the web architecture and can use a wide variety of information as input. This affects their structure, data system, computational resources and accuracy. The knowledge of the input information and the understanding of how it can be handled to make predictions can help to improve the design of current prediction engines, and consequently prefetching techniques. This paper analyzes fifty of the most relevant algorithms proposed along 15 years of prefetching research and proposes a taxonomy where the algorithms are classified according to the input data they use. For each group, the main advantages and shortcomings are highlighted. © 2012 Elsevier Ltd. All rights reserved.This work has been partially supported by Spanish Ministry of Science and Innovation under Grant TIN2009-08201, Generalitat Valenciana under Grant GV/2011/002 and Universitat Politecnica de Valencia under Grant PAID-06-10/2424.Domenech, J.; De La Ossa Perez, BA.; Sahuquillo Borrás, J.; Gil Salinas, JA.; Pont Sanjuan, A. (2012). A taxonomy of web prediction algorithms. Expert Systems with Applications. 39(9):8496-8502. https://doi.org/10.1016/j.eswa.2012.01.140S8496850239

    Deterministic Object Management in Large Distributed Systems

    Get PDF
    Caching is a widely used technique to improve the scalability of distributed systems. A central issue with caching is maintaining object replicas consistent with their master copies. Large distributed systems, such as the Web, typically deploy heuristic-based consistency mechanisms, which increase delay and place extra load on the servers, while not providing guarantees that cached copies served to clients are up-to-date. Server-driven invalidation has been proposed as an approach to strong cache consistency, but it requires servers to keep track of which objects are cached by which clients. We propose an alternative approach to strong cache consistency, called MONARCH, which does not require servers to maintain per-client state. Our approach builds on a few key observations. Large and popular sites, which attract the majority of the traffic, construct their pages from distinct components with various characteristics. Components may have different content types, change characteristics, and semantics. These components are merged together to produce a monolithic page, and the information about their uniqueness is lost. In our view, pages should serve as containers holding distinct objects with heterogeneous type and change characteristics while preserving the boundaries between these objects. Servers compile object characteristics and information about relationships between containers and embedded objects into explicit object management commands. Servers piggyback these commands onto existing request/response traffic so that client caches can use these commands to make object management decisions. The use of explicit content control commands is a deterministic, rather than heuristic, object management mechanism that gives content providers more control over their content. The deterministic object management with strong cache consistency offered by MONARCH allows content providers to make more of their content cacheable. Furthermore, MONARCH enables content providers to expose internal structure of their pages to clients. We evaluated MONARCH using simulations with content collected from real Web sites. The results show that MONARCH provides strong cache consistency for all objects, even for unpredictably changing ones, and incurs smaller byte and message overhead than heuristic policies. The results also show that as the request arrival rate or the number of clients increases, the amount of server state maintained by MONARCH remains the same while the amount of server state incurred by server invalidation mechanisms grows

    Adaptive Caching of Distributed Components

    Get PDF
    Die Zugriffslokalität referenzierter Daten ist eine wichtige Eigenschaft verteilter Anwendungen. Lokales Zwischenspeichern abgefragter entfernter Daten (Caching) wird vielfach bei der Entwicklung solcher Anwendungen eingesetzt, um diese Eigenschaft auszunutzen. Anschliessende Zugriffe auf diese Daten können so beschleunigt werden, indem sie aus dem lokalen Zwischenspeicher bedient werden. Gegenwärtige Middleware-Architekturen bieten dem Anwendungsprogrammierer jedoch kaum Unterstützung für diesen nicht-funktionalen Aspekt. Die vorliegende Arbeit versucht deshalb, Caching als separaten, konfigurierbaren Middleware-Dienst auszulagern. Durch die Einbindung in den Softwareentwicklungsprozess wird die frühzeitige Modellierung und spätere Wiederverwendung caching-spezifischer Metadaten gewährleistet. Zur Laufzeit kann sich das entwickelte System außerdem bezüglich der Cachebarkeit von Daten adaptiv an geändertes Nutzungsverhalten anpassen.Locality of reference is an important property of distributed applications. Caching is typically employed during the development of such applications to exploit this property by locally storing queried data: Subsequent accesses can be accelerated by serving their results immediately form the local store. Current middleware architectures however hardly support this non-functional aspect. The thesis at hand thus tries outsource caching as a separate, configurable middleware service. Integration into the software development lifecycle provides for early capturing, modeling, and later reuse of cachingrelated metadata. At runtime, the implemented system can adapt to caching access characteristics with respect to data cacheability properties, thus healing misconfigurations and optimizing itself to an appropriate configuration. Speculative prefetching of data probably queried in the immediate future complements the presented approach

    Measuring named data networks

    Get PDF
    2020 Spring.Includes bibliographical references.Named Data Networking (NDN) is a promising information-centric networking (ICN) Internet architecture that addresses the content directly rather than addressing servers. NDN provides new features, such as content-centric security, stateful forwarding, and in-network caches, to better satisfy the needs of today's applications. After many years of technological research and experimentation, the community has started to explore the deployment path for NDN. One NDN deployment challenge is measurement. Unlike IP, which has a suite of measurement approaches and tools, NDN only has a few achievements. NDN routing and forwarding are based on name prefixes that do not refer to individual endpoints. While rich NDN functionalities facilitate data distribution, they also break the traditional end-to-end probing based measurement methods. In this dissertation, we present our work to investigate NDN measurements and fill some research gaps in the field. Our thesis of this dissertation states that we can capture a substantial amount of useful and actionable measurements of NDN networks from end hosts. We start by comparing IP and NDN to propose a conceptual framework for NDN measurements. We claim that NDN can be seen as a superset of IP. NDN supports similar functionalities provided by IP, but it has unique features to facilitate data retrieval. The framework helps identify that NDN lacks measurements in various aspects. This dissertation focuses on investigating the active measurements from end hosts. We present our studies in two directions to support the thesis statement. We first present the study to leverage the similarities to replicate IP approaches in NDN networks. We show the first work to measure the NDN-DPDK forwarder, a high-speed NDN forwarder designed and implemented by the National Institute of Standards and Technology (NIST), in a real testbed. The results demonstrate that Data payload sizes dominate the forwarding performance, and efficiently using every fragment to improve the goodput. We then present the first work to replicate packet dispersion techniques in NDN networks. Based on the findings in the NDN-DPDK forwarder benchmark, we devise the techniques to measure interarrivals for Data packets. The results show that the techniques successfully estimate the capacity on end hosts when 1Gbps network cards are used. Our measurements also indicate the NDN-DPDK forwarder introduces variance in Data packet interarrivals. We identify the potential bottlenecks and the possible causes of the variance. We then address the NDN specific measurements, measuring the caching state in NDN networks from end hosts. We propose a novel method to extract fingerprints for various caching decision mechanisms. Our simulation results demonstrate that the method can detect caching decisions in a few rounds. We also show that the method is not sensitive to cross-traffic and can be deployed on real topologies for caching policy detection

    Evaluation, Analysis and adaptation of web prefetching techniques in current web

    Full text link
    Abstract This dissertation is focused on the study of the prefetching technique applied to the World Wide Web. This technique lies in processing (e.g., downloading) a Web request before the user actually makes it. By doing so, the waiting time perceived by the user can be reduced, which is the main goal of the Web prefetching techniques. The study of the state of the art about Web prefetching showed the heterogeneity that exists in its performance evaluation. This heterogeneity is mainly focused on four issues: i) there was no open framework to simulate and evaluate the already proposed prefetching techniques; ii) no uniform selection of the performance indexes to be maximized, or even their definition; iii) no comparative studies of prediction algorithms taking into account the costs and benefits of web prefetching at the same time; and iv) the evaluation of techniques under very different or few significant workloads. During the research work, we have contributed to homogenizing the evaluation of prefetching performance by developing an open simulation framework that reproduces in detail all the aspects that impact on prefetching performance. In addition, prefetching performance metrics have been analyzed in order to clarify their definition and detect the most meaningful from the user's point of view. We also proposed an evaluation methodology to consider the cost and the benefit of prefetching at the same time. Finally, the importance of using current workloads to evaluate prefetching techniques has been highlighted; otherwise wrong conclusions could be achieved. The potential benefits of each web prefetching architecture were analyzed, finding that collaborative predictors could reduce almost all the latency perceived by users. The first step to develop a collaborative predictor is to make predictions at the server, so this thesis is focused on an architecture with a server-located predictor. The environment conditions that can be found in the web are alsDoménech I De Soria, J. (2007). Evaluation, Analysis and adaptation of web prefetching techniques in current web [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1841Palanci

    Review of Web caching and replication by Michael Rabinovich and Oliver Spatscheck. Addison Wesley 2002.

    No full text

    Optimization inWeb Caching: Cache Management, Capacity Planning, and Content Naming

    Full text link
    Caching is fundamental to performance in distributed information retrieval systems such as the World Wide Web. This thesis introduces novel techniques for optimizing performance and cost-effectiveness in Web cache hierarchies. When requests are served by nearby caches rather than distant servers, server loads and network traffic decrease and transactions are faster. Cache system design and management, however, face extraordinary challenges in loosely-organized environments like the Web, where the many components involved in content creation, transport, and consumption are owned and administered by different entities. Such environments call for decentralized algorithms in which stakeholders act on local information and private preferences. In this thesis I consider problems of optimally designing new Web cache hierarchies and optimizing existing ones. The methods I introduce span the Web from point of content creation to point of consumption: I quantify the impact of content-naming practices on cache performance; present techniques for variable-quality-of-service cache management; describe how a decentralized algorithm can compute economically-optimal cache sizes in a branching two-level cache hierarchy; and introduce a new protocol extension that eliminates redundant data transfers and allows “dynamic” content to be cached consistently. To evaluate several of my new methods, I conducted trace-driven simulations on an unprecedented scale. This in turn required novel workload measurement methods and efficient new characterization and simulation techniques. The performance benefits of my proposed protocol extension are evaluated using two extraordinarily large and detailed workload traces collected in a traditional corporate network environment and an unconventional thin-client system. My empirical research follows a simple but powerful paradigm: measure on a large scale an important production environment’s exogenous workload; identify performance bounds inherent in the workload, independent of the system currently serving it; identify gaps between actual and potential performance in the environment under study; and finally devise ways to close these gaps through component modifications or through improved inter-component integration. This approach may be applicable to a wide range of Web services as they mature.Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/1/kelly-optimization_web_caching.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/2/kelly-optimization_web_caching.ps.bz

    Dynamic content delivery infrastructure deployment using network cloud resources

    Get PDF
    Millionen von Menschen schätzen die Inhalte und Anwendungen, die das Internet zur Verfügung stellt. Um der steigenden Nachfrage an populären Inhalten wie z.B. High-Definition Video oder Online Social Networks nachzukommen, wurden weit verteilte Content Delivery Infrastructures (CDIs) aufgebaut. Damit CDIs im harten Wettbewerbs bestehen können, suchen sie ständig neue Möglichkeiten um laufende Kosten zu senken und Ihre Leistungsfähigkeit zu steigern. Jedoch machen den CDIs eine geringe Agilität bei der Allokation von Servern zu schaffen. Informationen zur Steigerung von Effizienz und Leistungsfähigkeit wie z.B. aktuelle Netzwerkbedingungen und präzise User-Positionen sind den CDIs unbekannt. Obwohl Internet Service Provider (ISPs) diese Informationen besitzen, lassen auch neuere CDI-Architekturen eine mögliche Kollaboration außer Acht. Diese Dissertation untersucht den Einfluss von Kollaboration auf Content Delivery. Zunächst wird das heutige Design- und Betriebsfeld untersucht. Eine Analyse der operativen Daten eines Europäischen Tier-1 ISPs erörtert mögliche Verbesserungen. Erste Ergebnisse zeigen, dass Kollaboration bei der Zuordnung von Usern zu CDI Servern den Netzwerkverkehr lokal begrenzt und die Geschwindigkeit erhöht. Vorhandene Netzwerkpfade eröffnen neue Möglichkeiten der Verkehrssteuerung. Um die Kollaboration zwischen CDIs und ISPs zu ermöglichen, beschreibt diese Arbeit die beiden Key Enabler In-Network Server Allocation und Informed User-Server Assignment. Sie stellt außerdem ein Systemdesign vor, das diese realisiert: NetPaaS (Network Platform as a Service). In-Network Server Allocation nutzt im ISP verteilte Resourcen und aktuelle Virtualisierungstechnologien um eine agile Serverallokation zu ermöglichen. Informed User-Server Assignment erlaubt es ISPs, mögliche Netzwerkengpässe und präzise User-Positionen einzukalkulieren und so CDIs den besten Server für individuelle Nutzer zu empfehlen. Damit bietet NetPaaS einen zusätzlichen Freiheitsgrad zur dynamischen Skalierung von Serverinfrastrukturen. Um das Kollaborationspotential von NetPaaS aufzuzeigen, wird erstmals eine Studie mit operativen Daten des größten kommerziellen CDI und einem Europäischen Tier-1 ISP durchgeführt. Die Ergebniss zeigen, dass eine auf präzisen User-Positionen und aktuellen Netzwerkbedingungen basierende dynamische Serverallokation es dem CDI ermöglicht, besser mit der stark schwankenden Nachfrage nach Inhalten zurecht zu kommen und die Geschwindigkeit der Nutzer zu verbessern. Darüber hinaus führt die Nutzung von NetPaaS zu einer besseren Auslastung vorhandener Serverinfrastrukturen und ermöglicht ein verbessertes Verkehrsmanagement im Netz des ISP. Diese Ergebnisse lassen den Schluss zu, dass NetPaaS die Leistungsfähigkeit und Effizienz von CDIs stark verbessert und unter Umständen laufende Kosten und Investitionen reduziert. NetPaaS verbessert weiterhin das Verkehrsmanagement des ISP und bietet somit eine echte "win-win" Situation fur CDIs und ISPs.Millions of people value the Internet for the content and the applications it makes available. To cope with the increasing end-user demand for popular and often high volume content, e.g., high-definition video or online social networks, massively distributed Content Delivery Infrastructures (CDIs) have been deployed. However, a highly competitive market requires CDIs to constantly investigate new ways to reduce operational costs and improve delivery performance. Today, CDIs mainly suffer from limited agility in server deployment and are largely unaware of network conditions and precise end-user locations, information that improves the efficiency and performance of content delivery. While newly emerging architectures try to address these challenges, none so far considered collaboration, although ISPs have the information readily at hand. In this thesis, we assess the impact of collaboration on content delivery. We first evaluate the design and operating space of todays content delivery landscape and quantify possible benefits of collaboration by analyzing operational traces from an European Tier-1 ISP. We find that collaboration when assigning end-users to servers highly localizes CDI traffic and improves end-user performance. Moreover, we find significant path diversity which enables new mechanisms for traffic management. We propose two key enablers, namely in-network server allocation and informed user-server assignment, to facilitate CDI-ISP collaboration and present our system design, called NetPaaS (Network Platform as a Service), that realizes them. In- network server allocation offers agile server allocation close to the ISPs end-users leveraging virtualization technology and cloud style resources in the network. In- formed user-server assignment enables ISPs to take network bottlenecks and precise end-user locations into account and to recommend the best possible candidate server for individual end-users to CDIs. Therefore, NetPaaS provides an additional degree of freedom to scale-up or shrink the CDI footprint on demand. To quantify the potential of collaboration with NetPaaS, we perform a first-of-its- kind evaluation based on operational traces from the largest commercial CDI and an European Tier-1 ISP. Our findings reveal that dynamic server allocation based on accurate end-user locations and network conditions enables the CDI to better cope with increasing and highly volatile demand for content and improves the end-users performance. Moreover, recommendations from NetPaaS result in better utilization of existing server infrastructure and enables the ISP to better manage traffic flows inside its network. We conclude, that NetPaaS improves the performance and efficiency of content delivery architectures while potentially reducing the required capital investment and operational costs. Moreover, NetPaaS enables the ISP to achieve traffic engineering goals and therefore offers a true win-win situation to both CDIs and ISPs

    Systems for Challenged Network Environments.

    Full text link
    Developing regions face significant challenges in network access, making even simple network tasks unpleasant and rich media prohibitively difficult to access. Even as cellular network coverage is approaching a near-universal reach, good network connectivity remains scarce and expensive in many emerging markets. The underlying theme in this dissertation is designing network systems that better accommodate users in emerging markets. To do so, this dissertation begins with a nuanced analysis of content access behavior for web users in developing regions. This analysis finds the personalization of content access---and the fragmentation that results from it---to be significant factors in undermining many existing web acceleration mechanisms. The dissertation explores content access behavior from logs collected at shared internet access sites, as well as user activity information obtained from a commercial social networking service with over a hundred million members worldwide. Based on these observations, the dissertation then discusses two systems designed for improving end-user experience in accessing and using content in constrained networks. First, it deals with the challenge of distributing private content in these networks. By leveraging the wide availability of cellular telephones, the dissertation describes a system for personal content distribution based on user access behavior. The system enables users to request future data accesses, and it schedules content transfers according to current and expected capacity. Second, the dissertation looks at routing bulk data in challenged networks, and describes an experimentation platform for building systems for challenged networks. This platform enables researchers to quickly prototype systems for challenged networks, and iteratively evaluate these systems using mobility and network emulation. The dissertation describes a few data routing systems that were built atop this experimentation platform. Finally, the dissertation discusses the marketplace and service discovery considerations that are important in making these systems viable for developing-region use. In particular, it presents an extensible, auction-based market platform that relies on widely available communication tools for conveniently discovering and trading digital services and goods in developing regions. Collectively, this dissertation brings together several projects that aim to understand and improve end-user experience in challenged networks endemic to developing regions.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91401/1/azarias_1.pd

    Adaptivitätssensitive Platzierung von Replikaten in Adaptiven Content Distribution Networks

    Get PDF
    Adaptive Content Distribution Networks (A-CDNs) sind anwendungsübergreifende, verteilte Infrastrukturen, die auf Grundlage verteilter Replikation von Inhalten und Inhaltsadaption eine skalierbare Auslieferung von adaptierbaren multimedialen Inhalten an heterogene Clients ermöglichen. Die Platzierung der Replikate in den Surrogaten eines A-CDN wird durch den Platzierungsmechanismus des A-CDN gesteuert. Anders als in herkömmlichen CDNs, die keine Inhaltsadaption berücksichtigen, muss ein Platzierungsmechanismus in einem A-CDN nicht nur entscheiden, welches Inhaltsobjekt in welchem Surrogat repliziert werden soll, sondern darüber hinaus, in welcher Repräsentation bzw. in welchen Repräsentationen das Inhaltsobjekt zu replizieren ist. Herkömmliche Platzierungsmechanismen sind nicht in der Lage, verschiedene Repräsentationen eines Inhaltsobjektes zu berücksichtigen. Beim Einsatz herkömmlicher Platzierungsmechanismen in A-CDNs können deshalb entweder nur statisch voradaptierte Repräsentationen oder ausschließlich generische Repräsentationen repliziert werden. Während bei der Replikation von statisch voradaptierten Repräsentationen die Wiederverwendbarkeit der Replikate eingeschränkt ist, führt die Replikation der generischen Repräsentationen zu erhöhten Kosten und Verzögerungen für die dynamische Adaption der Inhalte bei jeder Anfrage. Deshalb werden in der Arbeit adaptivitätssensitive Platzierungsmechanismen zur Platzierung von Replikaten in A-CDNs vorgeschlagen. Durch die Berücksichtigung der Adaptierbarkeit der Inhalte bei der Ermittlung einer Platzierung von Replikaten in den Surrogaten des A-CDNs können adaptivitätssensitive Platzierungsmechanismen sowohl generische und statisch voradaptierte als auch teilweise adaptierte Repräsentationen replizieren. Somit sind sie in der Lage statische und dynamische Inhaltsadaption flexibel miteinander zu kombinieren. Das Ziel der vorliegenden Arbeit ist zu evaluieren, welche Vorteile sich durch die Berücksichtigung der Inhaltsadaption bei Platzierung von adaptierbaren Inhalten in A-CDNs realisieren lassen. Hierzu wird das Problem der adaptivitätssensitiven Platzierung von Replikaten in A-CDNs als Optimierungsproblem formalisiert, Algorithmen zur Lösung des Optimierungsproblems vorgeschlagen und diese in einem Simulator implementiert. Das zugrunde liegende Simulationsmodell beschreibt ein im Internet verteiltes A-CDN, welches zur Auslieferung von JPEG-Bildern an heterogene mobile und stationäre Clients verwendet wird. Anhand dieses Simulationsmodells wird die Leistungsfähigkeit der adaptivitätssensitiven Platzierungsmechanismen evaluiert und mit der von herkömmlichen Platzierungsmechanismen verglichen. Die Simulationen zeigen, dass der adaptivitätssensitive Ansatz in Abhängigkeit vom System- und Lastmodell sowie von der Speicherkapazität der Surrogate im A-CDN in vielen Fällen Vorteile gegenüber dem Einsatz herkömmlicher Platzierungsmechanismen mit sich bringt. Wenn sich die Anfragelasten verschiedener Typen von Clients jedoch nur wenig oder gar nicht überlappen oder bei hinreichend großer Speicherkapazität der Surrogate hat der adaptivitätssensitive Ansatz keine signifikanten Vorteile gegenüber dem Einsatz eines herkömmlichen Platzierungsmechanismus.Adaptive Content Distribution Networks (A-CDNs) are application independent, distributed infrastructures using content adaptation and distributed replication of contents to allow the scalable delivery of adaptable multimedia contents to heterogeneous clients. The replica placement in an A-CDN is controlled by the placement mechanisms of the A-CDN. As opposed to traditional CDNs, which do not take content adaptation into consideration, a replica placement mechanism in an A-CDN has to decide not only which object shall be stored in which surrogate but also which representation or which representations of the object to replicate. Traditional replica placement mechanisms are incapable of taking different representations of the same object into consideration. That is why A-CDNs that use traditional replica placement mechanisms may only replicate generic or statically adapted representations. The replication of statically adapted representations reduces the sharing of the replicas. The replication of generic representations results in adaptation costs and delays with every request. That is why the dissertation thesis proposes the application of adaptation-aware replica placement mechanisms. By taking the adaptability of the contents into account, adaptation-aware replica placement mechanisms may replicate generic, statically adapted and even partially adapted representations of an object. Thus, they are able to balance between static and dynamic content adaptation. The dissertation is targeted at the evaluation of the performance advantages of taking knowledge about the adaptability of contents into consideration when calculating a placement of replicas in an A-CDN. Therefore the problem of adaptation-aware replica placement is formalized as an optimization problem; algorithms for solving the optimization problem are proposed and implemented in a simulator. The underlying simulation model describes an Internet-wide distributed A-CDN that is used for the delivery of JPEG images to heterogeneous mobile and stationary clients. Based on the simulation model, the performance of the adaptation-aware replica placement mechanisms are evaluated and compared to traditional replica placement mechanisms. The simulations prove that the adaptation-aware approach is superior to the traditional replica placement mechanisms in many cases depending on the system and load model as well as the storage capacity of the surrogates of the A-CDN. However, if the load of different types of clients do hardly overlap or with sufficient storage capacity within the surrogates, the adaptation-aware approach has no significant advantages over the application of traditional replica-placement mechanisms
    corecore