11 research outputs found

    Document replication strategies for geographically distributed web search engines

    Get PDF
    Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved

    Dynamic distance maps of the Internet

    Get PDF
    There is an increasing number of Internet applications that attempt to optimize their network communication by considering the network distance across which data is transferred. Such applications range from replication management to mobile agent applications. One major problem of these applications is to efficiently acquire distance information for large computer networks. This paper presents an approach to create a global view on the Internet, a so-called network distance map, which realizes a hierarchical decomposition of the network into regions and which allows to estimate the network distance between any two hosts. This view is not only a single snapshot but is dynamically adapted to the continuously changing network conditions. The main idea is to use a certain set of hosts for performing distance measurements and to use the so gained information for estimating the distance between arbitrary hosts. A hierarchical clustering provides the notion of regions and allows to coordinate the measurements in such a way that the resulting network load is minimized. An experimental evaluation on the basis of 119 globally distributed measurement servers shows that already a small number of measurement servers allows to construct fairly accurate distance maps at low costs

    A Dynamic Object Replication and Migration Protocol for an Internet Hosting Service

    No full text
    This paper proposes a protocol suite for dynamic replication and migration of Internet objects. It consists of an algorithm for deciding on the number and location of object replicas and an algorithm for distributing requests among currently available replicas. Our approach attempts to place replicas in the vicinity of a majority of requests while ensuring at the same time that no servers are overloaded. The request distribution algorithm uses the same simple mechanism to take into account both server proximity and load, without actually knowing the latter. The replica placement algorithm executes autonomously on each node, without the knowledge of other object replicas in the system. The proposed algorithms rely on the information available in databases maintained by Internet routers. A simulation study using synthetic workloads and the network backbone of UUNET, one of the largest Internet service providers, shows that the proposed protocol is effective in eliminating hot spots and ..

    WACCO and LOKO: Strong Consistency at Global Scale

    Get PDF
    Motivated by a vision for future global-scale services supporting frequent updates and widespread concurrent reads, we propose a scalable object-sharing system called WACCO offering strong consistency semantics. WACCO propagates read responses on a tree-based topology to satisfy broad demand and migrates objects dynamically to place them close to that demand. To demonstrate WACCO, we use it to develop a service called LOKO that could roughly encompass the current duties of the DNS and simultaneously support granular status updates (e.g., currently preferred routes) in a future Internet. We evaluate LOKO, including the performance impact of updates, migration, and fault tolerance, using both traces of DNS queries served by Akamai and traces of NFS traffic on the UNC campus. WACCO uses a novel consistency model that is both stronger than sequential consistency and more scalable than linearizability. Our results show that this model performs better in the DNS case than the NFS case because the former represents a global, shared-object system which better fits the design goals of WACCO. We evaluate two different migration techniques, one of which considers not just client-visible latency but also the budget for the network (e.g., for public and hybrid clouds) among other factors.Doctor of Philosoph
    corecore