5 research outputs found

    Fuzzy Proximity Ranking with Boolean Queries

    Get PDF
    http://trec.nist.gov/pubs/trec14/papers/ecole-des-mines.pdfInternational audienceBased on the idea that the closer the query terms are in a document, the more relevant this document is, we experiment an IR method based on a fuzzy proximity degree of the query term occurences in a document to compute its relevance to the query. Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use a proximity operator. The fuzzy term proximity is controlled with an in uence function. Given a query term and a document, the in uence function associates to each position in the text a value dependant on the distance of the nearest occurence of this query term. To model proximity, this function is decreasing with distance. Di erent forms of function can be used: triangular, gaussian etc. For practical reasons only functions with nite support were used. The support of the function is limited by a constant called k. The fuzzy term proximity functions are associated to every leaves of the query tree. Then fuzzy proximities are computed for every nodes with a post-order tree traversal. Given the fuzzy proximities of the sons of a node, its fuzzy proximity is computed, like in the fuzzy IR models, with a mimimum (resp. maximum) combination for conjunctives (resp. disjunctives) nodes. Finally, a fuzzy query proximity value is obtained for each position in this document at the root of the query tree. The score of this document is the integration of the function obtained at the tree root. For the experiments, we modi ed Lucy (version 0.5.2) to implement our IR model. Three query sets are used for our eight runs. One set is manually built with the title words and some description words. Each of these words is OR'ed with its derivatives like plurals for instance. Then the OR nodes obtained are AND'ed at the tree root. The two automatic query sets are built with an AND of automatically extracted terms from either the title eld or the description eld. These three query sets are submitted to our system with two values of k: 50 and 200. As our method is aimed at high precision, it sometimes give less than one thousand answers. In such cases, the documents retrieved by the BM-25 method implemented in Lucy was concatenated after our result list

    Agente para recuperaci贸n autom谩tica de informaci贸n en diversos entornos basado en t茅cnicas de inteligencia computacional

    Get PDF
    Falta palabras clavesLa presente tesis se enmarca en la problem谩tica de la recuperaci贸n de informaci贸n, entendiendo por recuperaci贸n de informaci贸n la b煤squeda dentro de una colecci贸n de documentos diversos, de forma autom谩tica, de todos los documentos relacionados, con un cierto grado de relevancia, a partir de una consulta formulada por un usuario. En particular, expone un novedoso sistema, basado en l贸gica difusa, para la implementaci贸n de agentes inteligentes para resolver problemas reales de recuperaci贸n de informaci贸n en diversos entornos. Los m茅todos de recuperaci贸n de informaci贸n y de asignaci贸n de pesos desarrollados dan lugar a las publicaciones que se adjuntan en el compendio de esta tesis; y su aplicaci贸n propicia una entrada en la oficina de registro de la propiedad intelectual. En los trabajos de colaboraci贸n con empresa relacionados en el Cap铆tulo 5 se han implementado diversos prototipos de agentes inteligentes aplicando las t茅cnicas de inteligencia computacional que sustentan los m茅todos de recuperaci贸n de informaci贸n desarrollados, con la finalidad de crear agentes inteligentes para resoluci贸n de problemas reales en los que se necesita realizar una recuperaci贸n de informaci贸n. Los agentes inteligentes desarrollados utilizan el m茅todo de recuperaci贸n de informaci贸n, el m茅todo de asignaci贸n de pesos, y la estructura de almacenamiento de informaci贸n desarrollada en las publicaciones que forman el compendio de esta tesis. En dichas publicaciones se justifica el buen funcionamiento de estos m茅todos, as铆 como la mejora de rendimiento en la recuperaci贸n de informaci贸n contenida en portales web frente al modelo de espacio vectorial (Vector Space Model, VSM) y el m茅todo de asignaci贸n de pesos tf-idf

    Fuzzy Proximity Ranking with Boolean Queries

    No full text
    Based on the idea that the closer the query terms are in a document, the more relevant this document is, we experiment an IR method based on a fuzzy proximity degree of the query term occurences in a document to compute its relevance to the query. Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use a proximity operator. The fuzzy term proximity is controlled with an influence function. Given a query term and a document, the influence function associates to each position in the text a value dependant on the distance of the nearest occurence of this query term. To model proximity, this function is decreasing with distance. Di#erent forms of function can be used: triangular, gaussian etc. For practical reasons only functions with finite support were used. The support of the function is limited by a constant called k. The fuzzy term proximity functions are associated to every leaves of the query tree. Then fuzzy proximities are computed for every nodes with a post-order tree traversal. Given the fuzzy proximities of the sons of a node, its fuzzy proximity is computed, like in the fuzzy IR models, with a mimimum (resp. maximum) combination for conjunctives (resp. disjunctives) nodes. Finally, a fuzzy query proximity value is obtained for each position in this document at the root of the query tree. The score of this document is the integration of the function obtained at the tree root. For the experiments, we modified Lucy (version 0.5.2) to implement our IR model. Three query sets are used for our eight runs. One set is manually built with the title words and some description words. Each of these words is OR'ed with its derivatives like plurals for instance. Then the OR nodes ob..
    corecore