23 research outputs found
Recommended from our members
Interactive query expansion and relevance feedback for document retrieval systems
This thesis is aimed at investigating interactive query expansion within the context of a relevance feedback system that uses term weighting and ranking in searching online databases that are available through online vendors. Previous evaluations of relevance feedback systems have been made in laboratory conditions and not in a real operational environment. The research presented in this thesis followed the idea of testing probabilistic retrieval techniques in an operational environment. The overall aim of this research was to investigate the process of interactive query expansion (IQE) from various points of view including effectiveness. The INSPEC database, on both Data-Star and ESA-IRS, was searched online using CIRT, a front-end system that allows probabilistic term weighting, ranking and relevance feedback. The thesis is divided into three parts. Part I of the thesis covers background information and appropriate literature reviews with special emphasis on the relevance weighting theory (Binary Independence Model), the approaches to automatic and semi-automatic query expansion, the ZOOM facility of ESA/IRS and the CIRT front-end. Part II is comprised of three Pilot case studies. It introduces the idea of interactive query expansion and places it within the context of the weighted environment of CIRT. Each Pilot study looked at different aspects of the query expansion process by using a front-end. The Pilot studies were used to answer methodological questions and also research questions about the query expansion terms. The knowledge and experience that was gained from the Pilots was then applied to the methodology of the study proper (Part III). Part III discusses the Experiment and the evaluation of the six ranking algorithms. The Experiment was conducted under real operational conditions using a real system, real requests, and real interaction. Emphasis was placed on the characteristics of the interaction, especially on the selection of terms for query expansion. Data were collected from 25 searches. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results of the Experiment are presented according to their treatment of query expansion as main results and other findings in Chapter 10. The main results discuss issues that relate directly to query expansion, retrieval effectiveness, the correspondence of the online-to-offline relevance judgements, and the performance of the w(p — q) ranking algorithm. Finally, a comparative evaluation of six ranking algorithms was performed. The yardstick for the evaluation was provided by the user relevance judgements on the lists of the candidate terms for query expansion. The evaluation focused on whether there are any similarities in the performance of the algorithms and how those algorithms with similar performance treat terms. This abstract refers only to the main conclusions drawn from the results of the Experiment: (1) One third of the terms presented in the list of candidate terms was on average identified by the users as potentially useful for query expansion; (2) These terms were mainly judged as either variant expression (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationship of the 5 best terms chosen by the users for query expansion to the initial query terms was: (a) 34% have no relationship or other type of correspondence with a query term; (b) 66% of the query expansion terms have a relationship which makes the term: (bl) narrower term (70%), (b2) broader term (5%), (b3) related term (25%). (4) The results provide some evidence for the effectiveness of interactive query expansion. The initial search produced on average 3 highly relevant documents at a precision of 34%; the query expansion search produced on average 9 further highly relevant documents at slightly higher precision. (5) The results demonstrated the effectiveness of the w(p—q) algorithm, for the ranking of terms for query expansion, within the context of the Experiment. (6) The main results of the comparative evaluation of the six ranking algorithms, i.e. w(p — q), EMIM, F4, F4modifed, Porter and ZOOM, are that: (a) w(p — q) and EMIM performed best; and (b) the performance between w(p — q) and EMIM and between F4 and F4modified is very similar; (7) A new ranking algorithm is proposed as the result of the evaluation of the six algorithms. Finally, an investigation is by definition an exploratory study which generates hypotheses for future research. Recommendations and proposals for future research are given. The conclusions highlight the need for more research on weighted systems in operational environments, for a comparative evaluation of automatic vs interactive query expansion, and for user studies in searching weighted systems
Relevance feedback and query expansion for searching the web: a model for searching a digital library
A fully operational large scale digital library is likely to be based on a distributed architecture and because of this it is likely that a number of independent search engines may be used to index different overlapping portions of the entire contents of the library. In any case, different media, text, audio, image, etc., will be indexed for retrieval by different search engines so techniques which provide a coherent and unified search over a suite of underlying independent search engines are thus likely to be an important part of navigating in a digital library. In this paper we present an architecture and a system for searching the world's largest DL, the world wide web. What makes our system novel is that we use a suite of underlying web search engines to do the bulk of the work while our system orchestrates them in a parallel fashion to provide a higher level of information retrieval functionality. Thus it is our meta search engine and not the underlying direct search engines that provide the relevance feedback and query expansion options for the user. The paper presents the design and architecture of the system which has been implemented, describes an initial version which has been operational for almost a year, and outlines the operation of the advanced version
a term is known by the company it keeps”: On selecting a good expansion set in pseudo-relevance feedback
Abstract. It is well known that pseudo-relevance feedback (PRF) improves the retrieval performance of Information Retrieval (IR) systems in general. However, a recent study by Cao et al [3] has shown that a non-negligible fraction of expansion terms used by PRF algorithms are harmful to the retrieval. In other words, a PRF algorithm would be better off if it were to use only a subset of the feedback terms. The challenge then is to find a good expansion set from the set of all candidate expansion terms. A natural approach to solve the problem is to make term independence assumption and use one or more term selection criteria or a statistical classifier to identify good expansion terms independent of each other. In this work, we challenge this approach and show empirically that a feedback term is neither good nor bad in itself in general; the behavior of a term depends very much on other expansion terms. Our finding implies that a good expansion set can not be found by making term independence assumption in general. As a principled solution to the problem, we propose spectral partitioning of expansion terms using a specific term-term interaction matrix. We demonstrate on several test collections that expansion terms can be partitioned into two sets and the best of the two sets gives substantial improvements in retrieval performance over model-based feedback
A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries
Interactive query expansion can assist users during their query formulation
process. We conducted a user study with over 4,000 unique visitors and four
different design approaches for a search term suggestion service. As a basis
for our evaluation we have implemented services which use three different
vocabularies: (1) user search terms, (2) terms from a terminology service and
(3) thesaurus terms. Additionally, we have created a new combined service which
utilizes thesaurus term and terms from a domain-specific search term
re-commender. Our results show that the thesaurus-based method clearly is used
more often compared to the other single-method implementations. We interpret
this as a strong indicator that term suggestion mechanisms should be
domain-specific to be close to the user terminology. Our novel combined
approach which interconnects a thesaurus service with additional statistical
relations out-performed all other implementations. All our observations show
that domain-specific vocabulary can support the user in finding alternative
concepts and formulating queries.Comment: To be published in Proceedings of Theories and Practice in Digital
Libraries (TPDL), 201
Interactive query expansion and relevance feedback for document retrieval systems
SIGLEAvailable from British Library Document Supply Centre- DSC:DX173253 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
Online searching aids A subject bibliography on front-ends, gateways..
SIGLEAvailable from British Library Document Supply Centre- DSC:2327.6789R(11) / BLDSC - British Library Document Supply CentreGBUnited Kingdo
A classified bibliography on online public access catalogues
SIGLEAvailable from British Library Document Supply Centre- DSC:2327.6789R(10) / BLDSC - British Library Document Supply Centre2. edGBUnited Kingdo
Non-english web search: An evaluation of indexing and searching the Greek web
The study reports on a longitudinal and comparative evaluation of Greek language searching on the web. Ten engines, five global (A9, AltaVista, Google, MSN Search, and Yahoo!) and five Greek (Anazitisi, Ano-Kato, Phantis. Trinity, and Visto), were evaluated using (a) navigational queries in 2004 and 2006; and (b) by measuring the freshness of the search engine indices in 2005 and 2006. Homepage finding queries for known Greek organizations were created and searched. Queries included the name of the organization in its Greek and non-Greek, English or transliterated equivalent forms. The organizations represented ten categories: government departments, universities, colleges, travel agencies, museums, media (TV, radio, newspapers), transportation, and banks. The freshness of the indices was evaluated by examining the status of the returned URLs (live versus dead) from the navigational queries, and by identifying if the engines have indexed 32480 active (live) Greek domain URLs. Effectiveness measures included (a) qualitative assessment of how engines handle the Greek language; (b) precision at 10 documents (P@10); (c) mean reciprocal rank (MRR); (d) Navigational Query Discounted Cumulative Gain (NQ-DCG), a new heuristic evaluation measure; (e) response time; (f) the ratio of the dead URL links returned, (g) the presence or absence of URLs and the decay observed over the period of the study. The results report on which of the global and Greek search engines perform best; and if the performance achieved is good enough from a user's perspective. © 2009 Springer Science+Business Media, LLC
A review of online searching aids
SIGLEAvailable from British Library Document Supply Centre- DSC:2327.684R(BL-RP--86) / BLDSC - British Library Document Supply CentreGBUnited Kingdo