Search CORE

6 research outputs found

Document replication strategies for geographically distributed web search engines

Author: Aykanat C.
Cambazoglu B. B.
Kayaaslan E.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved

Bilkent University Institutional Repository

Performance and Analysis of Transfer Control Protocol Over Voice Over Wireless Local Area Network

Author: Patil Rajendra
Publication venue: Scholarworks @ Morehead State
Publication date: 01/08/2008
Field of study

A thesis presented to the faculty of the College of Science and Technology at Morehead State University in partial fulfillment of the requirements for the Degree Master of Science by Rajendra Patil in August of 2008

Morehead State University

Swarm Based Implementation of a Virtual Distributed Database System in a Sensor Network

Author: Lee Wen C.
Publication venue: AFIT Scholar
Publication date: 01/03/2004
Field of study

The deployment of unmanned aerial vehicles (UAVs) in recent military operations has had success in carrying out surveillance and combat missions in sensitive areas. An area of intense research on UAVs has been on controlling a group of small-sized UAVs to carry out reconnaissance missions normally undertaken by large UAVs such as Predator or Global Hawk. A control strategy for coordinating the UAV movements of such a group of UAVs adopts the bio-inspired swarm model to produce autonomous group behavior. This research proposes establishing a distributed database system on a group of swarming UAVs, providing for data storage during a reconnaissance mission. A distributed database system model is simulated treating each UAV as a distributed database site connected by a wireless network. In this model, each UAV carries a sensor and communicates to a command center when queried. Drawing equivalence to a sensor network, the network of UAVs poses as a dynamic ad-hoc sensor network. The distributed database system based on a swarm of UAVs is tested against a set of reconnaissance test suites with respect to evaluating system performance. The design of experiments focuses on the effects of varying the query input and types of swarming UAVs on overall system performance. The results show that the topology of the UAVs has a distinct impact on the output of the sensor database. The experiments measuring system delays also confirm the expectation that in a distributed system, inter-node communication costs outweigh processing costs

AFTI Scholar (Air Force Institute of Technology)

Partial collection replication versus caching for information retrieval systems

Author: Zhihong Lu
Publication venue: ACM Press
Publication date: 01/01/2000
Field of study

Abstract The explosion of content in distributed infer-marion retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR sys-tem performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collec-tion(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but repli-cas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query sim-ilarity, similarity improves query locality up to 15 % over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance

CiteSeerX

Partial Collection Replication versus Caching for Information Retrieval Systems

Author: Kathryn S. McKinley
Zhihong Lu
Publication venue: ACM Press
Publication date: 01/01/2000
Field of study

The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR system performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collection (s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but replicas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query similarity, similarity improves query locality up to 15% over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance

CiteSeerX

Filtrado de respuestas parciales en arquitecturas de recuperación de información distribuidas

Author: Puentes Calvo Juan Francisco
Publication venue
Publication date: 01/01/2006
Field of study

[Resumen] El gran incremento del volumen de información disponible en línea desde hace unas décadas hace necesarias, cada vez más, técnicas de recuperación de información con el objetivo de gestionar, recuperar y filtrar la información disponible por estos medios, Las líneas de investigación actuales han descubierto la existencia de un cuello de botella en el canal de respuesta de las arquitecturas de recuperación de información distribuida, debido, principalmente, al gran número de respuestas parciales y su consiguiente influencia sobre las arquitecturas de comunicaciones y sobre los brokers de usuario. Por un lado, si disminuimos el número de respuestas parciales generados por cada servidor de consulta, disminuimos la precisión de la respuesta final; por el otro lado, si diseñamos una arquitectura de comunicaciones que soporte el tráfico generado obtenemos un claro cuello de botella al ordenar los resultados parciales en los brokers de usuario con el fin de seleccionar la respuesta final. La solución que propone este trabajo consiste en una arquitectura filtrada de comunicaciones implementada mediante nodos programables, los cuales forman una subred que posee la capacidad de procesar de forma transparente el tráfico que la atraviesa, con el objetivo de reducir la cardinalidad de las respuestas parciales. Así pues la principal ventaja de usar nodos programables en vez de, por ejemplo, brokers es la transparencia; lo que permite construir infraestructuras de canales de respuesta altamente flexibles y facilmetne modificables en tiempo de ejecución

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas