5 research outputs found

    Improving Information Retrieval Systems using Part of Speech Tagging

    Get PDF
    The object of Information Retrieval is to retrieve all relevantdocuments for a user query and only those relevant documents. Muchresearch has focused on achieving this objective with little regard forstorage overhead or performance. In the paper we evaluate the use ofPart of Speech Tagging to improve, the index storage overhead andgeneral speed of the system with only a minimal reduction to precisionrecall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and1989 document collection provided by TREC for parts of speech. We thenexperimented to find the most relevant part of speech to index. We showthat 90 percent of precision recall is achieved with 40 percent of the documentcollections terms. We also show that this is a improvement in overheadwith only a 1 percent reduction in precision recall

    Inter-relaão das técnicas Term Extration e Query Expansion aplicadas na recuperação de documentos textuais

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-graduação em Engenharia e Gestão do ConhecimentoConforme Sighal (2006) as pessoas reconhecem a importância do armazenamento e busca da informação e, com o advento dos computadores, tornou-se possível o armazenamento de grandes quantidades dela em bases de dados. Em conseqüência, catalogar a informação destas bases tornou-se imprescindível. Nesse contexto, o campo da Recuperação da Informação, surgiu na década de 50, com a finalidade de promover a construção de ferramentas computacionais que permitissem aos usuários utilizar de maneira mais eficiente essas bases de dados. O principal objetivo da presente pesquisa é desenvolver um Modelo Computacional que possibilite a recuperação de documentos textuais ordenados pela similaridade semântica, baseado na intersecção das técnicas de Term Extration e Query Expansion

    Proceedings of the Third Dutch-Belgian Information Retrieval Workshop (DIR 2002)

    Get PDF

    La recuperación de información en el siglo XX : Revisión y aplicación de aspectos de la lingüística cuantitativa y la modelización matemática de la información

    Get PDF
    Esta tesina indaga en el ámbito de las Tecnologías de la Información sobre los diferentes desarrollos realizados en la interpretación automática de la semántica de textos y su relación con los Sistemas de Recuperación de Información. Partiendo de una revisión bibliográfica selectiva se busca sistematizar la documentación estableciendo de manera evolutiva los principales antecedentes y técnicas, sintetizando los conceptos fundamentales y resaltando los aspectos que justifican la elección de unos u otros procedimientos en la resolución de los problemas.Facultad de Humanidades y Ciencias de la Educació

    MDS TREC6 Report

    No full text
    Introduction This year the MDS group has participated in the ad hoc task, the Chinese task, the speech track, and the interactive track. It is our first year of participation in the speech and interactive tracks. We found the participation in both of these tracks of great benefit and interest. 2 Full Description of Techniques In this section of the paper we will give as complete a description as we can of our methodology. We do so by describing the following: term definition, casefolding, stopping, and stemming. This defines the terms that we use. We then give the formula used for matching. After this we give exact descriptions of how we carry out passage retrieval, term expansion, and combination. A term is a sequence of characters chosen from the alphabet fa--z,A--Z,0--9g. The sequence has a maximum length of 256 but if the string consists solely of numbers a maximum length of 4 applies. All other characters are treated as term d