7 research outputs found

    Query Performance Analyser - a tool for bridging information retrieval research and instruction

    Get PDF
    Information retrieval experiments usually measure the average effectiveness of IR methods developed. The analysis of individual queries is neglected although test results may contain individual test topics where general findings do not hold. The paper argues that, for the real user of an IR system, the study of variation in results is even more important than averages. The Interactive Query Performance Analyser (QPA) for information retrieval systems is a tool for analysing and comparing the performance of individual queries. On top of a standard test collection, it gives an instant visualisation of the performance achieved in a given search topic by any user-generated query. In addition to experimental IR research, QPA can be used in user training to demonstrate the characteristics of and compare differences between IR systems and searching strategies. The experiences in applying the tool both in IR experiments and in IR instruction are reported. The need for bridging research and instruction is underlined

    Proceedings of the 6th Dutch-Belgian Information Retrieval Workshop

    Get PDF

    Mixed-Language Arabic- English Information Retrieval

    Get PDF
    Includes abstract.Includes bibliographical references.This thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries
    corecore