7,388 research outputs found
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
EGO: a personalised multimedia management tool
The problems of Content-Based Image Retrieval (CBIR) sys- tems can be attributed to the semantic gap between the low-level data representation and the high-level concepts the user associates with images, on the one hand, and the time-varying and often vague nature of the underlying information need, on the other. These problems can be addressed by improving the interaction between the user and the system. In this paper, we sketch the development of CBIR interfaces, and introduce our view on how to solve some of the problems of the studied interfaces. To address the semantic gap and long-term multifaceted information needs, we propose a "retrieval in context" system. EGO is a tool for the management of image collections, supporting the user through personalisation and adaptation. We will describe how it learns from the user's personal organisation, allowing it to recommend relevant images to the user. The recommendation algorithm is detailed, which is based on relevance feedback techniques
Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View
Multimedia collections are more than ever growing in size and diversity.
Effective multimedia retrieval systems are thus critical to access these
datasets from the end-user perspective and in a scalable way. We are interested
in repositories of image/text multimedia objects and we study multimodal
information fusion techniques in the context of content based multimedia
information retrieval. We focus on graph based methods which have proven to
provide state-of-the-art performances. We particularly examine two of such
methods : cross-media similarities and random walk based scores. From a
theoretical viewpoint, we propose a unifying graph based framework which
encompasses the two aforementioned approaches. Our proposal allows us to
highlight the core features one should consider when using a graph based
technique for the combination of visual and textual information. We compare
cross-media and random walk based results using three different real-world
datasets. From a practical standpoint, our extended empirical analysis allow us
to provide insights and guidelines about the use of graph based methods for
multimodal information fusion in content based multimedia information
retrieval.Comment: An extended version of the paper: Visual and Textual Information
Fusion in Multimedia Retrieval using Semantic Filtering and Graph based
Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM
Transactions on Information System
- …