Search CORE

164,986 research outputs found

A Hybrid Model for Document Retrieval Systems.

Author: Zou Zhen-bao
Publication venue: LSU Digital Commons
Publication date: 01/01/1988
Field of study

A methodology for the design of document retrieval systems is presented. First, a composite index term weighting model is developed based on term frequency statistics, including document frequency, relative frequency within document and relative frequency within collection, which can be adjusted by selecting various coefficients to fit into different indexing environments. Then, a composite retrieval model is proposed to process a user\u27s information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators, through two phases. That is, we have a search for documents which are topically relevant to the information request by means of a descriptor matching mechanism, which incorporate a partial matching facility based on a structurally-restricted relationship imposed by indexing model, and is more general than matching functions of the traditional Boolean model and vector space model, and then we have a ranking of these topically relevant documents, by means of two types of heuristic-based selection rules and a knowledge-based evaluation function, in descending order of a preference score which predicts the combined effect of user preference for quality, recency, fitness and reachability of documents

Louisiana State University

Fuzzy term proximity with boolean queries at 2006 TREC Terabyte task

Author: Beigbeder Michel
Mercier Annabelle
Publication venue: HAL CCSD
Publication date: 14/11/2006
Field of study

http://trec.nist.gov/pubs/trec15/papers/ecole.tera.final.pdfInternational audienceWe report here the results of fuzzy term proximity method app lied to Terabyte Task. Fuzzy proxmity main feature is based on the idea that the clos er the query terms are in a document, the more relevant this document is. With this p rinciple, we have a high precision method so we complete by these obtained with Zettair search engine default method (dirichlet). Our model is able to deal with Boolean qu eries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use a proximity operator because it can not be generalized to node s. The fuzzy term proximity is controlled with an influence function. Given a query term a nd a document, the influence function associates to each position in the text a value depe ndant of the distance of the nearest occurence of this query term. To model proximity, th is function is decreasing with distance. Different forms of function can be used: triangula r, gaussian etc. For practical reasons only functions with finite support were used. The sup port of the function is limited by a constant called k. The fuzzy term proximity func tions are associated to every leaves of the query tree. Then fuzzy proximities are co mputed for every nodes with a post-order tree traversal. Given the fuzzy proximities of the sons of a node, its fuzzy proximity is computed, like in the fuzzy IR models, with a mim imum (resp. maximum) combination for conjunctives (resp. disjunctives) nodes. Finally, a fuzzy query proximity value is obtained for each position in this document at the ro ot of the query tree. The score of this document is the integration of the function obt ained at the tree root. For the experiments, we modify Lucy (version 0.5.2) to implement ou r matching function. Two query sets are used for our runs. One set is manually built wit h the title words (and sometimes some description words). Each of these words is OR 'ed with its derivatives like plurals for instance. Then the OR nodes obtained are AND'ed a t the tree root. An other automatic query sets is built with an AND of automatically ex tracted terms from the title field. These two query sets are submitted to our system with tw o values of k: 50 and 200. The two corresponding query sets with flat queries are also su bmitted to zettair search engine

A Contextual Information Retrieval Model based on Influence Diagrams

Author: Boughanem Mohand
Tamine Lynda
Zemirli Nesrine
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

International audienceA key challenge in information retrieval is the use of contex- tual evidence within the ad-hoc retrieval. Our contribution is particularly based on the belief that contextual retrieval is a decision-making prob- lem. For this reason we propose to apply influence diagrams witch are an extension of Bayesian networks to such problems, in order to solve the hard problem of user based relevance estimation. The basic underlying idea is to substitute to the traditional relevance function which measures the degree of matching document-query, a function indexed by the user. In our approach, the user profile is represented by his long term interests. In order to validate our model, we propose furthermore a novel evaluation protocol suitable for the contextual retrieval task. The test collection is an expansion of the standard TREC test data, obtained using a learning scenario of the user's interests. The experimental results show that our model is promising

A Deep Relevance Matching Model for Ad-hoc Retrieval

Author: Giles R. C. S. L. L.
Hu B.
Lu Z.
Mikolov T.
Qiu X.
Socher R.
Wan S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/11/2017
Field of study

In recent years, deep neural networks have led to exciting breakthroughs in speech recognition, computer vision, and natural language processing (NLP) tasks. However, there have been few positive results of deep models on ad-hoc retrieval tasks. This is partially due to the fact that many important characteristics of the ad-hoc retrieval task have not been well addressed in deep models yet. Typically, the ad-hoc retrieval task is formalized as a matching problem between two pieces of text in existing work using deep models, and treated equivalent to many NLP tasks such as paraphrase identification, question answering and automatic conversation. However, we argue that the ad-hoc retrieval task is mainly about relevance matching while most NLP matching tasks concern semantic matching, and there are some fundamental differences between these two matching tasks. Successful relevance matching requires proper handling of the exact matching signals, query term importance, and diverse matching requirements. In this paper, we propose a novel deep relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model employs a joint deep architecture at the query term level for relevance matching. By using matching histogram mapping, a feed forward matching network, and a term gating network, we can effectively deal with the three relevance matching factors mentioned above. Experimental results on two representative benchmark collections show that our model can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.Comment: CIKM 2016, long pape

arXiv.org e-Print Archive

SemRank: ranking refinement strategy by using the semantic intensity

Author: Aslam Nida
Loo Jonathan
Loomes Martin
RoohUllah
Ullah Irfan
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2011
Field of study

AbstractThe ubiquity of the multimedia has raised a need for the system that can store, manage, structured the multimedia data in such a way that it can be retrieved intelligently. One of the current issues in media management or data mining research is ranking of retrieved documents. Ranking is one of the provocative problems for information retrieval systems. Given a user query comes up with the millions of relevant results but if the ranking function cannot rank it according to the relevancy than all results are just obsolete. However, the current ranking techniques are in the level of keyword matching. The ranking among the results is usually done by using the term frequency. This paper is concerned with ranking the document relying merely on the rich semantic inside the document instead of the contents. Our proposed ranking refinement strategy known as SemRank, rank the document based on the semantic intensity. Our approach has been applied on the open benchmark LabelMe dataset and compared against one of the well known ranking model i.e. Vector Space Model (VSM). The experimental results depicts that our approach has achieved significant improvement in retrieval performance over the state of the art ranking methods