5 research outputs found

    Measuring retrieval effectiveness based on user preference of documents

    Get PDF

    A quadratic lower bound for Rocchio’s similarity-based relevance feedback algorithm with a fixed query updating factor

    Get PDF
    Rocchio’s similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In practice, Rocchio’s algorithm often uses a fixed query updating factor. When this is the case, we strengthen the linear Ω(n) lower bound obtained by Chen and Zhu (Inf. Retr. 5:61–86, 2002) and prove that Rocchio’s algorithm makes Ω(k(n−k)) mistakes in searching for a collection of documents represented by a monotone disjunction of k relevant features over the n-dimensional binary vector space {0,1}n, when the inner product similarity measure is used. A quadratic lower bound is obtained when k is linearly proportional to n. We also prove an O(k(n−k)3) upper bound for Rocchio’s algorithm with the inner product similarity measure in searching for such a collection of documents with a constant query updating factor and a zero classification threshold

    IfD - information for discrimination

    Get PDF
    The problem of term mismatch and ambiguity has long been serious and outstanding in IR. The problem can result in the system formulating an incomplete and imprecise query representation, leading to a failure of retrieval. Many query reformulation methods have been proposed to address the problem. These methods employ term classes which are considered as related to individual query terms. They are hindered by the computational cost of term classification, and by the fact that the terms in some class are generally related to some specific query term belonging to the class rather than relevant to the context of the query. In this thesis we propose a series of methods for automatic query reformulation (AQR). The methods constitute a formal model called IfD, standing for Information for Discrimination. In IfD, each discrimination measure is modelled as information contained in terms supporting one of two opposite hypotheses. The extent of association of terms with the query can thus be defined based directly on the discrimination. The strength of association of candidate terms with the query can then be computed, and good terms can be selected to enhance the query. Justifications for IfD are presented from several aspects: formal interpretations of infor­mation for discrimination are introduced to show its soundness; criteria are put forward to show its rationality; properties of discrimination measures are analysed to show its appro­priateness; examples are examined to show its usability; extension is discussed to show its potential; implementation is described to show its feasibility; comparisons with other methods are made to show its flexibility; improvements in retrieval performance are exhibited to show its powerful capability. Our conclusion is that the advantage and promise IfD should make it an indispensable methodology for AQR, which we believe can be an effective technique for improvement in retrieval performance

    Linear structure in information retrieval

    No full text
    corecore