Search CORE

11,096 research outputs found

Retrieving with good sense

Author: Sanderson M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

CiteSeerX

White Rose Research Online

Accurate user directed summarization from existing tools

Author: Sanderson M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

This paper describes a set of experimental results produced from the TIPSTER SUMMAC initiative on user directed summaries: document summaries generated in the context of an information need expressed as a query. The summarizer that was evaluated was based on a set of existing statistical techniques that had been applied successfully to the INQUERY retrieval system. The techniques proved to have a wider utility, however, as the summarizer was one of the better performing systems in the SUMMAC evaluation. The design of this summarizer is presented with a range of evaluations: both those provided by SUMMAC as well as a set of preliminary, more informal, evaluations that examined additional aspects of the summaries. Amongst other conclusions, the results reveal that users can judge the relevance of documents from their summary almost as accurately as if they had had access to the document’s full text

CiteSeerX

Crossref

White Rose Research Online

Word sense disambiguation and information retrieval

Author: Sanderson M.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1914
Field of study

It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval (IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will increase. However, recent research into the application of a word sense disambiguator to an IR system failed to show any performance increase. From these results it has become clear that more basic research is needed to investigate the relationship between sense ambiguity, disambiguation, and IR. Using a technique that introduces additional sense ambiguity into a collection, this paper presents research that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of accuracy

MIT Libraries Dome

White Rose Research Online

Revisiting h measured on UK LIS and IR academics

Author: Sanderson M.
Publication venue: 'Wiley'
Publication date: 01/01/2008
Field of study

A brief communication appearing in this journal ranked UK LIS and (some) IR academics by their h-index using data derived from Web of Science. In this brief communication, the same academics were re-ranked, using other popular citation databases. It was found that for academics who publish more in computer science forums, their h was significantly different due to highly cited papers missed by Web of Science; consequently their rank changed substantially. The study was widened to a broader set of UK LIS and IR academics where results showed similar statistically significant differences. A variant of h, hmx, was introduced that allowed a ranking of the academics using all citation databases together

RMIT Research Repository

White Rose Research Online

Duplicate Detection in the Reuters Collection

Author: Sanderson M.
Publication venue: Department of Computing Science
Publication date: 01/01/1997
Field of study

While conducting some experiments with the Reuters collection, it was discovered that contained within it were a number of documents that were exact duplicates of each other (see Figure 1). A short study was conducted to try to discover how many such documents there were. The results of this study revealed that the notion of a duplicate document was not as simple as first thought. The contents of this report are as follows. A brief review of previous duplicate detection research will be presented, followed by a description of the methods and results of the duplicate detection work conducted here. In addition, there is an appendix holding the document ids of the various types of duplicate found

CiteSeerX

White Rose Research Online