Search CORE

3 research outputs found

The SPIRIT collection: an overview of a large web collection

Author: Cacheda F.
Hideo Joho
Mark Sanderson
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

A large scale collection of web pages has been essential for research in information retrieval and related areas. This paper provides an overview of a large web collection used in the SPIRIT project for the design and testing of spatially-aware retrieval systems. Several statistics are derived and presented to show the characteristics of the collection

CiteSeerX

Crossref

White Rose Research Online

Performance comparison of clustered and replicated information retrieval systems

Author: Cacheda F.
Carneiro V.
Ounis I.
Plachouras V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent

Enlighten

Filtrado de respuestas parciales en arquitecturas de recuperación de información distribuidas

Author: Puentes Calvo Juan Francisco
Publication venue
Publication date: 01/01/2006
Field of study

[Resumen] El gran incremento del volumen de información disponible en línea desde hace unas décadas hace necesarias, cada vez más, técnicas de recuperación de información con el objetivo de gestionar, recuperar y filtrar la información disponible por estos medios, Las líneas de investigación actuales han descubierto la existencia de un cuello de botella en el canal de respuesta de las arquitecturas de recuperación de información distribuida, debido, principalmente, al gran número de respuestas parciales y su consiguiente influencia sobre las arquitecturas de comunicaciones y sobre los brokers de usuario. Por un lado, si disminuimos el número de respuestas parciales generados por cada servidor de consulta, disminuimos la precisión de la respuesta final; por el otro lado, si diseñamos una arquitectura de comunicaciones que soporte el tráfico generado obtenemos un claro cuello de botella al ordenar los resultados parciales en los brokers de usuario con el fin de seleccionar la respuesta final. La solución que propone este trabajo consiste en una arquitectura filtrada de comunicaciones implementada mediante nodos programables, los cuales forman una subred que posee la capacidad de procesar de forma transparente el tráfico que la atraviesa, con el objetivo de reducir la cardinalidad de las respuestas parciales. Así pues la principal ventaja de usar nodos programables en vez de, por ejemplo, brokers es la transparencia; lo que permite construir infraestructuras de canales de respuesta altamente flexibles y facilmetne modificables en tiempo de ejecución

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas