6 research outputs found

    Document replication strategies for geographically distributed web search engines

    Get PDF
    Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved

    Performance and Analysis of Transfer Control Protocol Over Voice Over Wireless Local Area Network

    Get PDF
    A thesis presented to the faculty of the College of Science and Technology at Morehead State University in partial fulfillment of the requirements for the Degree Master of Science by Rajendra Patil in August of 2008

    Swarm Based Implementation of a Virtual Distributed Database System in a Sensor Network

    Get PDF
    The deployment of unmanned aerial vehicles (UAVs) in recent military operations has had success in carrying out surveillance and combat missions in sensitive areas. An area of intense research on UAVs has been on controlling a group of small-sized UAVs to carry out reconnaissance missions normally undertaken by large UAVs such as Predator or Global Hawk. A control strategy for coordinating the UAV movements of such a group of UAVs adopts the bio-inspired swarm model to produce autonomous group behavior. This research proposes establishing a distributed database system on a group of swarming UAVs, providing for data storage during a reconnaissance mission. A distributed database system model is simulated treating each UAV as a distributed database site connected by a wireless network. In this model, each UAV carries a sensor and communicates to a command center when queried. Drawing equivalence to a sensor network, the network of UAVs poses as a dynamic ad-hoc sensor network. The distributed database system based on a swarm of UAVs is tested against a set of reconnaissance test suites with respect to evaluating system performance. The design of experiments focuses on the effects of varying the query input and types of swarming UAVs on overall system performance. The results show that the topology of the UAVs has a distinct impact on the output of the sensor database. The experiments measuring system delays also confirm the expectation that in a distributed system, inter-node communication costs outweigh processing costs

    Partial collection replication versus caching for information retrieval systems

    No full text
    Abstract The explosion of content in distributed infer-marion retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR sys-tem performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collec-tion(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but repli-cas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query sim-ilarity, similarity improves query locality up to 15 % over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance

    Partial Collection Replication versus Caching for Information Retrieval Systems

    No full text
    The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR system performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collection (s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but replicas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query similarity, similarity improves query locality up to 15% over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance

    Filtrado de respuestas parciales en arquitecturas de recuperaci贸n de informaci贸n distribuidas

    Get PDF
    [Resumen] El gran incremento del volumen de informaci贸n disponible en l铆nea desde hace unas d茅cadas hace necesarias, cada vez m谩s, t茅cnicas de recuperaci贸n de informaci贸n con el objetivo de gestionar, recuperar y filtrar la informaci贸n disponible por estos medios, Las l铆neas de investigaci贸n actuales han descubierto la existencia de un cuello de botella en el canal de respuesta de las arquitecturas de recuperaci贸n de informaci贸n distribuida, debido, principalmente, al gran n煤mero de respuestas parciales y su consiguiente influencia sobre las arquitecturas de comunicaciones y sobre los brokers de usuario. Por un lado, si disminuimos el n煤mero de respuestas parciales generados por cada servidor de consulta, disminuimos la precisi贸n de la respuesta final; por el otro lado, si dise帽amos una arquitectura de comunicaciones que soporte el tr谩fico generado obtenemos un claro cuello de botella al ordenar los resultados parciales en los brokers de usuario con el fin de seleccionar la respuesta final. La soluci贸n que propone este trabajo consiste en una arquitectura filtrada de comunicaciones implementada mediante nodos programables, los cuales forman una subred que posee la capacidad de procesar de forma transparente el tr谩fico que la atraviesa, con el objetivo de reducir la cardinalidad de las respuestas parciales. As铆 pues la principal ventaja de usar nodos programables en vez de, por ejemplo, brokers es la transparencia; lo que permite construir infraestructuras de canales de respuesta altamente flexibles y facilmetne modificables en tiempo de ejecuci贸n
    corecore