111 research outputs found

    Exploiting the similarity of non-matching terms at retrieval time

    Get PDF
    In classic information retrieval systems a relevant document will not be retrieved in response to a query if the document and query representations do not share at least one term. This problem, known as 'term mismatch', has been recognised for a long time by the information retrieval community and a number of possible solutions have been proposed. Here I present a preliminary investigation into a new class of retrieval models that attempt to solve the term mismatch problem by exploiting complete or partial knowledge of term similarity in the term space. The use of term similarity can enhance classic retrieval models by taking into account non-matching terms. The theoretical advantages and drawbacks of these models are presented and compared with other models tackling the same problem. A preliminary experimental investigation into the performance gain achieved by exploiting term similarity with the proposed models is presented and discussed

    Information Search Patterns in Complex Tasks

    Get PDF
    acceptedVersionPeer reviewe

    Physicists' Information Tasks: Structure, Length and Retrieval Performance

    Get PDF
    In this poster, we describe central aspects of 65 natural information tasks from 23 senior researchers, PhDs, and experienced MSc students from three different university departments of physics. We analyze 1) the main purpose of the information task, 2) which and how many search facets were used to describe the tasks, 3) what semantic categories were used to express the search facets, and 4) retrieval performance. Results show variety in structure and length across task descriptions and task purposes. The results indicate effect of length and, in particular, of task purpose on retrieval performance of different document description levels that should be examined further

    The Effectiveness of Query-Based Hierarchic Clustering of Documents for Information Retrieval

    Get PDF
    Hierarchic document clustering has been applied to Information Retrieval (IR) for over three decades. Its introduction to IR was based on the grounds of its potential to improve the effectiveness of IR systems. Central to the issue of improved effectiveness is the Cluster Hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other, and therefore tend to appear in the same clusters. However, previous research has been inconclusive as to whether document clustering does bring improvements. The main motivation for this work has been to investigate methods for the improvement of the effectiveness of document clustering, by challenging some assumptions that implicitly characterise its application. Such assumptions relate to the static manner in which document clustering is typically performed, and include the static application of document clustering prior to querying, and the static calculation of interdocument associations. The type of clustering that is investigated in this thesis is query-based, that is, it incorporates information from the query into the process of generating clusters of documents. Two approaches for incorporating query information into the clustering process are examined: clustering documents which are returned from an IR system in response to a user query (post-retrieval clustering), and clustering documents by using query-sensitive similarity measures. For the first approach, post-retrieval clustering, an analytical investigation into a number of issues that relate to its retrieval effectiveness is presented in this thesis. This is in contrast to most of the research which has employed post-retrieval clustering in the past, where it is mainly viewed as a convenient and efficient means of presenting documents to users. In this thesis, post-retrieval clustering is employed based on its potential to introduce effectiveness improvements compared both to static clustering and best-match IR systems. The motivation for the second approach, the use of query-sensitive measures, stems from the role of interdocument similarities for the validity of the cluster hypothesis. In this thesis, an axiomatic view of the hypothesis is proposed, by suggesting that documents relevant to the same query (co-relevant documents) display an inherent similarity to each other which is dictated by the query itself. Because of this inherent similarity, the cluster hypothesis should be valid for any document collection. Past research has attributed failure to validate the hypothesis for a document collection to characteristics of the collection. Contrary to this, the view proposed in this thesis suggests that failure of a document set to adhere to the hypothesis is attributed to the assumptions made about interdocument similarity. This thesis argues that the query determines the context and the purpose for which the similarity between documents is judged, and it should therefore be incorporated in the similarity calculations. By taking the query into account when calculating interdocument similarities, co-relevant documents can be "forced" to be more similar to each other. This view challenges the typically static nature of interdocument relationships in IR. Specific formulas for the calculation of query-sensitive similarity are proposed in this thesis. Four hierarchic clustering methods and six document collections are used in the experiments. Three main issues are investigated: the effectiveness of hierarchic post-retrieval clustering which uses static similarity measures, the effectiveness of query-sensitive measures at increasing the similarity of pairs of co-relevant documents, and the effectiveness of hierarchic clustering which uses query-sensitive similarity measures. The results demonstrate the effectiveness improvements that are introduced by the use of both approaches of query-based clustering, compared both to the effectiveness of static clustering and to the effectiveness of best-match IR systems. Query-sensitive similarity measures, in particular, introduce significant improvements over the use of static similarity measures for document clustering, and they also significantly improve the structure of the document space in terms of the similarity of pairs of co-relevant documents. The results provide evidence for the effectiveness of hierarchic query-based clustering of documents, and also challenge findings of previous research which had dismissed the potential of hierarchic document clustering as an effective method for information retrieval

    Devise: a framework for the evaluation of internet search engines

    Get PDF
    This project investigates the feasibility of the use of user-satisfaction as a multidimensional evaluative construct of search engines. Search engine developments are reviewed to reveal a range of indexing and retrieval techniques that may assist casual users in the information retrieval task. Yet few evaluation studies have considered the impact of system features, in particular those with which the user interacts for search assistance. A broad review of retrieval system evaluation highlights the complex environment in which measures of both the utility of the search results and the usability of the system are sought from a user-perspective. Our proposed approach for a user-centered evaluation is based on a conceptual framework in which user-satisfaction is characterised as a variable dependent on system features and functions and expressed in a moderating context of user-task requirement. Towards this end, the research reported here focuses on the definition of the construct of user satisfaction on the multi dimensions of the retrieval process, an expression of what a typical user is trying to do. Empirical work was then undertaken to test the feasibility and potential value of the implementation of the framework for the evaluation of three search engines. Initial results are presented which provide a degree of understanding of how users are satisfied and on what criteria. This provides the basis on which we make recommendations for the refinement of the multidimensional framework and its use as a methodology for the evaluation of search engines from a user perspective
    corecore