17 research outputs found

    Rocchio\u27s Model Based on Vector Space Basis Change for Pseudo Relevance Feedback

    Get PDF
    Rocchio\u27s relevance feedback model is a classic query expansion method and it has been shown to be effective in boosting information retrieval performance. The main problem with this method is that the relevant and the irrelevant documents overlap in the vector space because they often share same terms (at least the terms of the query). With respect to the initial vector space basis (index terms), it is difficult to select terms that separate relevant and irrelevant documents. The Vector Space Basis Change is used to separate relevant and irrelevant documents without any modification on the query term weights. In this paper, first, we study how to incorporate Vector Space Basis Change into the Rocchio\u27s model. Second, we propose Rocchio\u27s models based on Vector Space Basis Change, called VSBCRoc models. Experimental results on a TREC collection show that our proposed models are effective

    Impact of Ngrams-based indexing on XML retrieval

    Get PDF
    We present in this paper a statistical approach of term clustering. This approach is based on a statistical analysis of NGrams shared by a pair of terms and is inspired from the t f × idf criterion commonly used in information retrieval. Being statistical, the approach is completely independent from the lexical and grammatical characteristics of the language in which documents to be indexed are written. Classical indexing is often based on stemming, which consists of transforming a term into its radical. This allows to provide large issues for customized information access. As for us, we consider that this can be made by building term clusters and perform information retrieval based on this concept. This approach is used for XML retrieval, therefore some experiments have been undertaken into a dataset provided by INEX to show its impact compared to Porter stemming method

    Proposition pour l’intĂ©gration des rĂ©seaux petits mondes en recherche d’information

    Get PDF
    International audienceWe propose in this paper an approach for document clustering. It consists of representing the corpus as a document graph, where the links are defined by some criteria. These links are quantified by simialrity measures. We aim join this context into the approach of classification to constitute small-worlds networks of homogeneous documents. The homogeneity of the clusters is measured according to the properties of small worlds. The clusters, as well as their proprietes, allow to rerank search results. Some experiments were done on a corpus provided by TREC and the obtained results show the contribution of small-worlds networks in information retrieval.Nous proposons dans ce papier une approche de classification d’un corpus de documents. Elle consiste en une reprĂ©sentation du corpus sous forme de graphe, oĂč les liens sont dĂ©finis par certains critĂšres. Ces liens sont quantifiĂ©s par des mesures de similaritĂ©. Nous visons Ă  intĂ©grer ce contexte dans l’approche de classification afin de constituer des rĂ©seaux petits mondes de documents homogĂšnes. L’homogĂ©nĂ©itĂ© des classes est valuĂ©e suivant les propriĂ©tĂ©s des rĂ©seaux petits mondes. Les classes, ainsi que leurs propriĂ©tĂ©s, nous servent au rĂ©-ordonnancement de documents rĂ©sultats de recherche. Quelques expĂ©rimentations ont Ă©tĂ© menĂ©es sur un corpus issu de TREC 1 et les rĂ©sultats obtenus montrent l’apport des rĂ©seaux petits mondes en recherche d’information

    XML Retrieval

    Get PDF
    Non

    A Re-Ranking Method Based on Irrelevant Documents in Ad-Hoc Retrieval

    Get PDF
    In this paper, we propose a novel approach for document re-ranking, which relies on the concept of negative feedback represented by irrelevant documents. In a previous paper, a pseudo-relevance feedback method is introduced using an absorbing document ~d which best fits the user\u27s need. The document ~d is orthogonal to the majority of irrelevant documents. In this paper, this document is used to re-rank the initial set of ranked documents in Ad-hoc retrieval. The evaluation carried out on a standard document collection shows the effectiveness of the proposed approach

    Automatic Diagnosis of Breast Tissue

    Get PDF

    Investigating the combination of structural and textual information about multimedia retrieval Sana FAKHFAKH

    No full text
    Abstract—The expansion of structured information in different applications introduces a new ambiguity in multimedia retrieval in semi-structured documents. We investigate in this paper the combination of textual and structural context for multimedia retrieval in XML document thus we present a indexing model which combines textual and structural information. We propose a geometric method who use implicitly of textual and structural context of XML elements and we are particularly interested by improve the effectiveness of various structural factors for multimedia retrieval. Using a geometric metric, we can represent structural information in XML document with a vector for each element. Given a textual query, our model lets us combine scores obtained from each sources of evidence and return a list of relevant retrieved multimedia element. Experimental evaluation is carried out using the INEX Ad Hoc Task 2007 and the Image CLEF Wikipedia Retrieval Task 2010. The results show that combination of scores of textual modality and structural modality significantly improves compared results of using a single modality. Keywords—Geometric distance; multimedia retrieval; element; structure; document modeling I
    corecore