Search CORE

625 research outputs found

Recommended from our members

An experimental comparison of a genetic algorithm and a hill-climber for term selection

Author: MacFarlane A.
May P.
Secker A.
Timmis J.
Publication venue: 'Emerald'
Publication date: 01/01/2010
Field of study

Purpose – The term selection problem for selecting query terms in information filtering and routing has been investigated using hill-climbers of various kinds, largely through the Okapi experiments in the TREC series of conferences. Although these are simple deterministic approaches which examine the effect of changing the weight of one term at a time, they have been shown to improve the retrieval effectiveness of filtering queries in these TREC experiments. Hill-climbers are, however, likely to get trapped in local optima, and the use of more sophisticated local search techniques for this problem that attempt to break out of these optima are worth investigating. To this end, we apply a genetic algorithm (GA) to the same problem. Design/Methodology/Approach – We use a standard TREC test collection from the TREC-8 filtering track, recording mean average precision and recall measures to allow comparison between the hillclimber and GA algorithms. We also vary elements of the GA, such as probability of a word being included, probability of mutation and population size in order to measure the effect of these variables. Different strategies such as Elitist and Non-Elitist methods are used, as well as Roulette Wheel and Rank selection GA algorithms. Findings – The results of tests suggest that both techniques are, on average, better than the baseline, but the implemented GA does not match the overall performance of a hill-climber. The Rank selection algorithm does better on average than the Roulette Wheel algorithm. There is no evidence in this study that varying word inclusion probability, mutation probability or Elitist method make much difference to the overall results. Small population sizes do not appear to be as effective as larger population sizes. Research limitations/implications – The evidence provided here would suggest that being stuck in a local optima for the term selection optimization problem does not appear to be detrimental to the overall success of the hill-climber. The evidence from term rank order would appear to provide extra useful evidence which hill-climbers can use efficiently and effectively to narrow the search space. Originality/Value – The paper represents the first attempt to compare hill-climbers with GAs on a problem of this type

City Research Online

Crossref

Aberystwyth Research Portal

Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents

Author: Jones Gareth J.F.
Lam-Adesina Adenike M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Important legacy paper documents are digitized and collected in online accessible archives. This enables the preservation, sharing, and significantly the searching of these documents. The text contents of these document images can be transcribed automatically using OCR systems and then stored in an information retrieval system. However, OCR systems make errors in character recognition which have previously been shown to impact on document retrieval behaviour. In particular relevance feedback query-expansion methods, which are often effective for improving electronic text retrieval, are observed to be less reliable for retrieval of scanned document images. Our experimental examination of the effects of character recognition errors on an ad hoc OCR retrieval task demonstrates that, while baseline information retrieval can remain relatively unaffected by transcription errors, relevance feedback via query expansion becomes highly unstable. This paper examines the reason for this behaviour, and introduces novel modifications to standard relevance feedback methods. These methods are shown experimentally to improve the effectiveness of relevance feedback for errorful OCR transcriptions. The new methods combine similar recognised character strings based on term collection frequency and a string edit-distance measure. The techniques are domain independent and make no use of external resources such as dictionaries or training data

Irish Universities

DCU Online Research Access Service

Dublin City University at CLEF 2007: Cross-Language Speech Retrieval Experiments

Author: Jones Gareth J.F.
Zhang Ke
Zhang Ying
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The Dublin City University participation in the CLEF 2007 CL-SR English task concentrated primarily on issues of topic translation. Our retrieval system used the BM25F model and pseudo relevance feedback. Topics were translated into English using the Yahoo! BabelFish free online service combined with domain-specific translation lexicons gathered automatically from Wikipedia. We explored alternative topic translation methods using these resources. Our results indicate that extending machine translation tools using automatically generated domainspecific translation lexicons can provide improved CLIR effectiveness for this task

Irish Universities

DCU Online Research Access Service

Recommended from our members

Local search: A guide for the information retrieval practitioner

Author: Abramson
Althofer
Andrew MacFarlane
Andrew Tuson
Baeck
Battiti
Boughanem
Cartwright
Chen
Chen
Chen
Cleverdon
Collins
Cordon
Cordon
Corne
Darwin
Dorigo
Downsland
Dueck
Fan
Fan
Fan
Fan
Feo
Fernandez-Villacanas Martin
Fogel
Fogel
Frakes
Frakes
Garey
Glover
Glover
Glover
Goldberg
Hajek
Harman
Harman
Harman
Harman
Hasan
Hawking
Hertz
Hertz
Holland
Hooker
Horng
Kekäläinen
Kirkpatrick
Koza
Kuflik
Lam
Lopez-Pujalte
Lopez-Pujalte
Lopez-Pujalte
Luke
Lundy
Martin-Bautisata
Masters
Michalewicz
Mock
Mock
Newell
Ogbu
Oliveira
Osman
Osman
Osman
Osman
Papadimitriou
Pohlheim
Rechenburg
Reeves
Reeves
Robertson
Sebastiani
Semet
Sinclair
Smith
Sparck Jones
Stefik
Tamine
Thangiah
Trotman
Van Laarhoven
Vrajitoru
Wartik
Yang
Zweben
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR

City Research Online

Crossref

Beyond English text: Multilingual and multimedia information retrieval.

Author: Jones Gareth J.F.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2005
Field of study

Non

CiteSeerX

DCU Online Research Access Service

Concept-based Interactive Query Expansion Support Tool (CIQUEST)

Author: Beaulieu M.
Joho H.
Sanderson M.
Publication venue: Resource: The Council for Museums, Archives and Libraries
Publication date: 01/01/2003
Field of study

This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

White Rose Research Online

Recommended from our members

The effect of dyslexia on information retrieval: A pilot study

Author: Al-Wabil A.
Albrair A.
Jones S. A.
MacFarlane A.
Marshall C. R.
Zaphiris P.
Publication venue: 'Emerald'
Publication date: 01/01/2010
Field of study

Purpose – The purpose of the paper is to resolve a gap in our knowledge of how people with dyslexia interact with Information Retrieval (IR) systems, specifically an understanding of their information searching behaviour. Very little research has been undertaken with this particular user group, and given the size of the group (an estimated 10% of the population) this lack of knowledge needs to be addressed. Design/Methodology/Approach - We use elements of the dyslexia cognitive profile to design a logging system recording the difference between two sets of participants: dyslexic and control users. We use a standard Okapi interface together with two standard TREC topics in order to record the information searching behaviour of these users. We gather evidence from various sources, including quantitative information on search logs, together with qualitative information from interviews and questionnaires. We record variables on queries, documents, relevance assessments and sessions in the search logs. We use this evidence to examine the difference in searching between the two sets of users, in order to understand the effect of dyslexia on the information searching behaviour. A topic analysis is also conducted on the quantitative data to show any effect on the results from the information need. Research limitations/implications – As this is a pilot study, only 10 participants were recruited for the study, 5 for each user group. Due to ethical issues, the number of topics per search was restricted to one topic only. The study shows that the methodology applied is useful for distinguishing between the two user groups, taking into account differences between topic. We outline further research on the back of this pilot study in four main areas. A different approach from the proposed methodology is needed to measure the effect on query variables, which takes account of topic variation. More details on users are needed such as reading abilities, speed of language processing and working memory to distinguish the user groups. Effect of topic on search interaction must be measured in order to record the potential impact on the dyslexic user group. Work is needed on relevance assessment and effect on precision and recall for users who may not read many documents. Findings – Using the log data, we establish the differences in information searching behaviour of control and dyslexic users i.e. in the way the two groups interact with Okapi, and that qualitative information collected (such as experience etc) may not be able to account for these differences. Evidence from query variables was unable to distinguish between groups, but differences on topic for the same variables were recorded. Users who view more documents tended to judge more documents as being relevant, either in terms of the user group or topic. Session data indicated that there may be an important difference between the number of iterations used in a search between the user groups, as there may be little effect from the topic on this variable. Originality/Value – This is the first study of the effect of dyslexia on information search behaviour, and provides some evidence to take the field forward

City Research Online

Ktisis

UCL Discovery

Recommended from our members

Query exhaustivity, relevance feedback and search success in automatic and interactive query expansion

Author: Jones S.
MacFarlane A.
Sormunen E.
Vakkari P.
Publication venue: 'Emerald'
Publication date: 01/04/2004
Field of study

This study explored how the expression of search facets and relevance feedback by users was related to search success in interactive and automatic query expansion in the course of the search process. Search success was measured both in the number of relevant documents retrieved and relevance scores of these items based on a four point scaling. Research design consisted of 26 users searching for four TREC topics in Okapi IR system, half using interactive and half automatic query expansion based on RF. The search logs were recorded, and the users filled in a questionnaire for each topic concerning various features of searching. The results showed that the exhaustivity of the query was the most significant predictor of search success, and that interactive expansion led to better search success than automatic one

City Research Online

Crossref

DutchHatTrick: semantic query modeling, ConText, section detection, and match score maximization

Author: Meij Edgar
Schuemie Martijn
Trieschnigg Dolf
Publication venue: National Institute of Standards and Technology (NIST)
Publication date: 01/01/2011
Field of study

This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient reports, and a patient visit was recorded in one or more reports

University of Twente Research Information

International Migration, Integration and Social Cohesion online publications

Relevance feedback for best match term weighting algorithms in information retrieval

Author: Hiemstra D.
Robertson S.E.
Publication venue: European Research Consortium for Informatics and Mathematics
Publication date: 01/01/2001
Field of study

Personalisation in full text retrieval or full text filtering implies reweighting of the query terms based on some explicit or implicit feedback from the user. Relevance feedback inputs the user's judgements on previously retrieved documents to construct a personalised query or user profile. This paper studies relevance feedback within two probabilistic models of information retrieval: the first based on statistical language models and the second based on the binary independence probabilistic model. The paper shows the resemblance of the approaches to relevance feedback of these models, introduces new approaches to relevance feedback for both models, and evaluates the new relevance feedback algorithms on the TREC collection. The paper shows that there are no significant differences between simple and sophisticated approaches to relevance feedback

CiteSeerX

Radboud Repository

University of Twente Research Information