Search CORE

11,148 research outputs found

Retrieving with good sense

Author: Sanderson M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

CiteSeerX

White Rose Research Online

Accurate user directed summarization from existing tools

Author: Sanderson M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

This paper describes a set of experimental results produced from the TIPSTER SUMMAC initiative on user directed summaries: document summaries generated in the context of an information need expressed as a query. The summarizer that was evaluated was based on a set of existing statistical techniques that had been applied successfully to the INQUERY retrieval system. The techniques proved to have a wider utility, however, as the summarizer was one of the better performing systems in the SUMMAC evaluation. The design of this summarizer is presented with a range of evaluations: both those provided by SUMMAC as well as a set of preliminary, more informal, evaluations that examined additional aspects of the summaries. Amongst other conclusions, the results reveal that users can judge the relevance of documents from their summary almost as accurately as if they had had access to the document’s full text

CiteSeerX

Crossref

White Rose Research Online

Word sense disambiguation and information retrieval

Author: Sanderson M.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1914
Field of study

It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval (IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will increase. However, recent research into the application of a word sense disambiguator to an IR system failed to show any performance increase. From these results it has become clear that more basic research is needed to investigate the relationship between sense ambiguity, disambiguation, and IR. Using a technique that introduces additional sense ambiguity into a collection, this paper presents research that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of accuracy

MIT Libraries Dome

White Rose Research Online

Revisiting h measured on UK LIS and IR academics

Author: Sanderson M.
Publication venue: 'Wiley'
Publication date: 01/01/2008
Field of study

A brief communication appearing in this journal ranked UK LIS and (some) IR academics by their h-index using data derived from Web of Science. In this brief communication, the same academics were re-ranked, using other popular citation databases. It was found that for academics who publish more in computer science forums, their h was significantly different due to highly cited papers missed by Web of Science; consequently their rank changed substantially. The study was widened to a broader set of UK LIS and IR academics where results showed similar statistically significant differences. A variant of h, hmx, was introduced that allowed a ranking of the academics using all citation databases together

RMIT Research Repository

White Rose Research Online

Duplicate Detection in the Reuters Collection

Author: Sanderson M.
Publication venue: Department of Computing Science
Publication date: 01/01/1997
Field of study

While conducting some experiments with the Reuters collection, it was discovered that contained within it were a number of documents that were exact duplicates of each other (see Figure 1). A short study was conducted to try to discover how many such documents there were. The results of this study revealed that the notion of a duplicate document was not as simple as first thought. The contents of this report are as follows. A brief review of previous duplicate detection research will be presented, followed by a description of the methods and results of the duplicate detection work conducted here. In addition, there is an appendix holding the document ids of the various types of duplicate found

CiteSeerX

White Rose Research Online

Word sense disambiguation and information retrieval

Author: Sanderson M.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1994
Field of study

White Rose Research Online

The Reuters collection

Author: Sanderson M.
Publication venue
Publication date: 01/01/1994
Field of study

This short paper presents the little known Reuters 22,173 test collection, which is significantly larger than most traditional test collections. In addition, Reuters has none of the recall calculation problems normally associated with some of the larger test collections now available. This paper explains the method (derived from Lewis [Lewis 91]) used to perform retrieval experiments on the Reuters collection. Then, to illustrate the use of Reuters, some simple retrieval experiments are also presented that compare the performance of stemming algorithms

White Rose Research Online

The infinite disk : challenges from no limitations

Author: Sanderson M.
Publication venue
Publication date
Field of study

Challenge: Managing and searching across multi-terabyte and potentially multi-petabyte personal stores of multimedia information

White Rose Research Online

Search of spoken documents retrieves well recognized transcripts

Author: Sanderson M.
Shou X.M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2007
Field of study

This paper presents a series of analyses and experiments on spoken document retrieval systems: search engines that retrieve transcripts produced by speech recognizers. Results show that transcripts that match queries well tend to be recognized more accurately than transcripts that match a query less well. This result was described in past literature, however, no study or explanation of the effect has been provided until now. This paper provides such an analysis showing a relationship between word error rate and query length. The paper expands on past research by increasing the number of recognitions systems that are tested as well as showing the effect in an operational speech retrieval system. Potential future lines of enquiry are also described

White Rose Research Online

Keep It Simple Sheffield – a KISS approach to the Arabic track

Author: Alberair A.
Sanderson M.
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/01/2001
Field of study

Sheffield’s participation in the inaugural Arabic cross language track is described here. Our goal was to examine how well one could achieve retrieval of Arabic text with the minimum of resources and adaptation of existing retrieval systems. To this end the public translators used for query translation and the minimal changes to our retrieval system are described. While the effectiveness of our resulting system is not as high as one might desire, it nevertheless provides reasonable performance particularly in the monolingual track: on average, just under four relevant documents were found in the 10 top ranked documents

White Rose Research Online