5 research outputs found
Incorporating String Search in a Hypertext System:User Interface and Signature File Design Issues
Hypertext systems provide an appealing mechanism for
informally browsing databases by traversing selectable links.
However, in many fact finding situations string search is an
effective complement to browsing. This paper describes the
application of the signature file method to achieve rapid and
convenient string search in small personal computer hypertext
environments. The method has been implemented in a prototype,
as well as in a commercial product. Performance data for search
times and storage space are presented from a commercial
hypertext database. User interface issues are then discussed.
Experience with the string search interface indicates that it was
used sucessfully by novice users.
(Also cross-referenced as CAR-TR-448
A Survey of Information Retrieval and Filtering Methods
We survey the major techniques for information retrieval. In the first
part, we provide an overview of the traditional ones (full text scanning,
inversion, signature files and clustering). In the second part we discuss
attempts to include semantic information (natural language processing,
latent semantic indexing and neural networks)
An effective Chinese indexing method based on partitioned signature files.
Wong Chi Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 107-114).Abstract also in Chinese.Abstract --- p.iiAcknowledgements --- p.viChapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction to Chinese IR --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Organization of this Thesis --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Indexing methods --- p.6Chapter 2.1.1 --- Full-text scanning --- p.7Chapter 2.1.2 --- Inverted files --- p.7Chapter 2.1.3 --- Signature files --- p.9Chapter 2.1.4 --- Clustering --- p.10Chapter 2.2 --- Information Retrieval Models --- p.10Chapter 2.2.1 --- Boolean model --- p.11Chapter 2.2.2 --- Vector space model --- p.11Chapter 2.2.3 --- Probabilistic model --- p.13Chapter 2.2.4 --- Logical model --- p.14Chapter 3 --- Investigation of Segmentation on the Vector Space Retrieval Model --- p.15Chapter 3.1 --- Segmentation of Chinese Texts --- p.16Chapter 3.1.1 --- Character-based segmentation --- p.16Chapter 3.1.2 --- Word-based segmentation --- p.18Chapter 3.1.3 --- N-Gram segmentation --- p.21Chapter 3.2 --- Performance Evaluation of Three Segmentation Approaches --- p.23Chapter 3.2.1 --- Experimental Setup --- p.23Chapter 3.2.2 --- Experimental Results --- p.24Chapter 3.2.3 --- Discussion --- p.29Chapter 4 --- Signature File Background --- p.32Chapter 4.1 --- Superimposed coding --- p.34Chapter 4.2 --- False drop probability --- p.36Chapter 5 --- Partitioned Signature File Based On Chinese Word Length --- p.39Chapter 5.1 --- Fixed Weight Block (FWB) Signature File --- p.41Chapter 5.2 --- Overview of PSFC --- p.45Chapter 5.3 --- Design Considerations --- p.50Chapter 6 --- New Hashing Techniques for Partitioned Signature Files --- p.59Chapter 6.1 --- Direct Division Method --- p.61Chapter 6.2 --- Random Number Assisted Division Method --- p.62Chapter 6.3 --- Frequency-based hashing method --- p.64Chapter 6.4 --- Chinese character-based hashing method --- p.68Chapter 7 --- Experiments and Results --- p.72Chapter 7.1 --- Performance evaluation of partitioned signature file based on Chi- nese word length --- p.74Chapter 7.1.1 --- Retrieval Performance --- p.75Chapter 7.1.2 --- Signature Reduction Ratio --- p.77Chapter 7.1.3 --- Storage Requirement --- p.79Chapter 7.1.4 --- Discussion --- p.81Chapter 7.2 --- Performance evaluation of different dynamic signature generation methods --- p.82Chapter 7.2.1 --- Collision --- p.84Chapter 7.2.2 --- Retrieval Performance --- p.86Chapter 7.2.3 --- Discussion --- p.89Chapter 8 --- Conclusions and Future Work --- p.91Chapter 8.1 --- Conclusions --- p.91Chapter 8.2 --- Future work --- p.95Chapter A --- Notations of Signature Files --- p.96Chapter B --- False Drop Probability --- p.98Chapter C --- Experimental Results --- p.103Bibliography --- p.10
Arquitectura de datos avanzada de un directorio web, con optimización de consultas restringidas a una zona del grafo de categorías
[Resumen]
Desde su origen, el World Wide Web ha sufrido un crecimiento exponencial que ha generado un gran volumen de información heterogénea accesible para cualquier usuario, Esto ha llevado a la utilización de herramientas eficientes para gestionar, recuperar y filtrar dicha información. En concreto, los directorios Web son taxonomías que clasifican documentos web, sobre los que posteriormente se realizarán consultas. Este tipo de sistemas de recuperación de información presenta un tipo específico de búsquedas, en donde la colección de documentos está restringida a una zona del grafo de categorías. Esta disertación presenta una arquitectura de datos específica para directorios Web que permite mejorar el rendimiento ante búsquedas restringidas. Dicha arquitectura se basa en una estructura de datos híbrida, constituida por un fichero invertido conteniendo embebido múltiples ficheros de firmas.
En base al modelo propuesto se definen dos variantes: la arquitectura híbrida con información total y la arquitectura híbrida con información parcial.
La valiez de esta arquitectura ha sido analizada mediante el desarrollo de ambas variantes para su comparación con un modelo básico, demostrando una clara mejoría en el rendimiento de las consultas restringidas, destacando especialmente el modelo híbrido con información parcial al responder adecuadamente bajo cualquier carga del sistema de búsqueda. A nivel general, la arquitectura propuesta se caracteriza por su facilidad de implementación, derivada de las estructuras de datos empleadas, su flexibilidad respecto al crecimiento del sistema y especialmente, por el buen rendimiento ofrecido ante búsquedas restringidas