47,751 research outputs found

    Visualising the structure of document search results: A comparison of graph theoretic approaches

    Get PDF
    This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion

    Image retrieval by hypertext links

    Get PDF
    This paper presents a model for retrieval of images from a large World Wide Web based collection. Rather than considering complex visual recognition algorithms, the model presented is based on combining evidence of the text content and hypertext structure of the Web. The paper shows that certain types of query are amply served by this form of representation. It also presents a novel means of gathering relevance judgements

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    Development and evaluation of clustering techniques for finding people

    Get PDF
    Typically in a large organisation much expertise and knowledge is held informally within employees' own memories. When employees leave an organisation many documented links that go through that person are broken and no mechanism is usually available to overcome these broken links. This match making problem is related to the problem of finding potential work partners in a large and distributed organisation. This paper reports a comparative investigation into using standard information retrieval techniques to group employees together based on their webpages. This information can, hopefully, be subsequently used to redirect broken links to people who worked closely with a departed employee or used to highlight people, say indifferent departments, who work on similar topics. The paper reports the design and positive results of an experiment conducted at RisĂž National Laboratory comparing four different IR searching and clustering approaches using real users' web pages

    The University of Glasgow at ImageClefPhoto 2009

    Get PDF
    In this paper we describe the approaches adopted to generate the five runs submitted to ImageClefPhoto 2009 by the University of Glasgow. The aim of our methods is to exploit document diversity in the rankings. All our runs used text statistics extracted from the captions associated to each image in the collection, except one run which combines the textual statistics with visual features extracted from the provided images. The results suggest that our methods based on text captions significantly improve the performance of the respective baselines, while the approach that combines visual features with text statistics shows lower levels of improvements
    • 

    corecore