1,652 research outputs found

    Fuzzy Proximity Ranking with Boolean Queries

    Get PDF
    http://trec.nist.gov/pubs/trec14/papers/ecole-des-mines.pdfInternational audienceBased on the idea that the closer the query terms are in a document, the more relevant this document is, we experiment an IR method based on a fuzzy proximity degree of the query term occurences in a document to compute its relevance to the query. Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use a proximity operator. The fuzzy term proximity is controlled with an in uence function. Given a query term and a document, the in uence function associates to each position in the text a value dependant on the distance of the nearest occurence of this query term. To model proximity, this function is decreasing with distance. Di erent forms of function can be used: triangular, gaussian etc. For practical reasons only functions with nite support were used. The support of the function is limited by a constant called k. The fuzzy term proximity functions are associated to every leaves of the query tree. Then fuzzy proximities are computed for every nodes with a post-order tree traversal. Given the fuzzy proximities of the sons of a node, its fuzzy proximity is computed, like in the fuzzy IR models, with a mimimum (resp. maximum) combination for conjunctives (resp. disjunctives) nodes. Finally, a fuzzy query proximity value is obtained for each position in this document at the root of the query tree. The score of this document is the integration of the function obtained at the tree root. For the experiments, we modi ed Lucy (version 0.5.2) to implement our IR model. Three query sets are used for our eight runs. One set is manually built with the title words and some description words. Each of these words is OR'ed with its derivatives like plurals for instance. Then the OR nodes obtained are AND'ed at the tree root. The two automatic query sets are built with an AND of automatically extracted terms from either the title eld or the description eld. These three query sets are submitted to our system with two values of k: 50 and 200. As our method is aimed at high precision, it sometimes give less than one thousand answers. In such cases, the documents retrieved by the BM-25 method implemented in Lucy was concatenated after our result list

    Fuzzy term proximity with boolean queries at 2006 TREC Terabyte task

    Get PDF
    http://trec.nist.gov/pubs/trec15/papers/ecole.tera.final.pdfInternational audienceWe report here the results of fuzzy term proximity method app lied to Terabyte Task. Fuzzy proxmity main feature is based on the idea that the clos er the query terms are in a document, the more relevant this document is. With this p rinciple, we have a high precision method so we complete by these obtained with Zettair search engine default method (dirichlet). Our model is able to deal with Boolean qu eries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use a proximity operator because it can not be generalized to node s. The fuzzy term proximity is controlled with an influence function. Given a query term a nd a document, the influence function associates to each position in the text a value depe ndant of the distance of the nearest occurence of this query term. To model proximity, th is function is decreasing with distance. Different forms of function can be used: triangula r, gaussian etc. For practical reasons only functions with finite support were used. The sup port of the function is limited by a constant called k. The fuzzy term proximity func tions are associated to every leaves of the query tree. Then fuzzy proximities are co mputed for every nodes with a post-order tree traversal. Given the fuzzy proximities of the sons of a node, its fuzzy proximity is computed, like in the fuzzy IR models, with a mim imum (resp. maximum) combination for conjunctives (resp. disjunctives) nodes. Finally, a fuzzy query proximity value is obtained for each position in this document at the ro ot of the query tree. The score of this document is the integration of the function obt ained at the tree root. For the experiments, we modify Lucy (version 0.5.2) to implement ou r matching function. Two query sets are used for our runs. One set is manually built wit h the title words (and sometimes some description words). Each of these words is OR 'ed with its derivatives like plurals for instance. Then the OR nodes obtained are AND'ed a t the tree root. An other automatic query sets is built with an AND of automatically ex tracted terms from the title field. These two query sets are submitted to our system with tw o values of k: 50 and 200. The two corresponding query sets with flat queries are also su bmitted to zettair search engine

    ENSM-SE at CLEF 2005: Uses of Fuzzy Proximity Matching Function

    Get PDF
    Extended version to be appear in LCNS http://clef.isti.cnr.it/2005/working_notes/workingnotes2005/mercier05.pdfBased on the idea that the closer the query terms in a document are, the more relevant this document is, we propose a information retrieval method based on a fuzzy proximity degree of term occurences to compute document relevance to a query. Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean information retrieval model, it does not explicitly use a proximity operator. A single parameter allows to control the proximity degree required. We explain how we construct the queries and we report the results of the experiments of the CLEF 2005 campaign before the conclusion

    ENSM-SE at CLEF 2006: AdHoc Uses of fuzzy proximity matching function

    Get PDF
    http://clef.isti.cnr.it/2006/working_notes/workingnotes2006/mercierCLEF2006.pdfInternational audienceStarting from the idea that the closer the query terms in a doc ument are to each other the more relevant the document, we propose an information re trieval method that uses the degree of fuzzy proximity of key terms in a document to com pute the relevance of the document to the query. Our model handles Boolean queries but, contrary to the traditional extensions of the basic Boolean information re trieval model, does not use a proximity operator explicitly. A single parameter makes i t possible to control the proximity degree required. To improve our system we use a ste mming algorithm before indexing, we take a specific influence function and we merge fu zzy proximity result list built with different spread of influence function. We explain how we construct the queries and report the results of our experiments in the ad-h oc monolingual French task of the CLEF 2006 evaluation campaign

    A Database Approach to Content-based XML retrieval

    Get PDF
    This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments

    Using Search Term Positions for Determining Document Relevance

    Get PDF
    The technological advancements in computer networks and the substantial reduction of their production costs have caused a massive explosion of digitally stored information. In particular, textual information is becoming increasingly available in electronic form. Finding text documents dealing with a certain topic is not a simple task. Users need tools to sift through non-relevant information and retrieve only pieces of information relevant to their needs. The traditional methods of information retrieval (IR) based on search term frequency have somehow reached their limitations, and novel ranking methods based on hyperlink information are not applicable to unlinked documents. The retrieval of documents based on the positions of search terms in a document has the potential of yielding improvements, because other terms in the environment where a search term appears (i.e. the neighborhood) are considered. That is to say, the grammatical type, position and frequency of other words help to clarify and specify the meaning of a given search term. However, the required additional analysis task makes position-based methods slower than methods based on term frequency and requires more storage to save the positions of terms. These drawbacks directly affect the performance of the most user critical phase of the retrieval process, namely query evaluation time, which explains the scarce use of positional information in contemporary retrieval systems. This thesis explores the possibility of extending traditional information retrieval systems with positional information in an efficient manner that permits us to optimize the retrieval performance by handling term positions at query evaluation time. To achieve this task, several abstract representation of term positions to efficiently store and operate on term positional data are investigated. In the Gauss model, descriptive statistics methods are used to estimate term positional information, because they minimize outliers and irregularities in the data. The Fourier model is based on Fourier series to represent positional information. In the Hilbert model, functional analysis methods are used to provide reliable term position estimations and simple mathematical operators to handle positional data. The proposed models are experimentally evaluated using standard resources of the IR research community (Text Retrieval Conference). All experiments demonstrate that the use of positional information can enhance the quality of search results. The suggested models outperform state-of-the-art retrieval utilities. The term position models open new possibilities to analyze and handle textual data. For instance, document clustering and compression of positional data based on these models could be interesting topics to be considered in future research
    • …
    corecore