Search CORE

3 research outputs found

RGU at ImageCLEF2010 Wikipedia Retrieval Task

Author: Kaliciak Leszek
Song Dawei
Wang Jun
Publication venue
Publication date: 01/09/2010
Field of study

This working notes paper describes our first participation in the ImageCLEF2010 Wikipedia Retrieval Task. In this task, we mainly test our Quantum Theory inspired retrieval function on cross media retrieval. Instead of heuristically combining the ranking scores independently from different media types, we develop a tensor product based model to represent textual and visual content features of an image as a non-separable composite system. Such system incorporates the statistical/semantic dependencies between certain features. Then the ranking scores of the images are computed in a way as quantum measurement does. Meanwhile, we also test a new local feature that we have developed for content based image retrieval

CiteSeerX

Open Research Online (The Open University)

Hybrid models for combination of visual and textual features in context-based image retrieval.

Author: Kaliciak Leszek
Publication venue
Publication date: 31/07/2013
Field of study

Visual Information Retrieval poses a challenge to intelligent information search systems. This is due to the semantic gap, the difference between human perception (information needs) and the machine representation of multimedia objects. Most existing image retrieval systems are monomodal, as they utilize only visual or only textual information about images. The semantic gap can be reduced by improving existing visual representations, making them suitable for a large-scale generic image retrieval. The best up-to-date candidates for a large-scale Content-based Image Retrieval are models based on the Bag of Visual Words framework. Existing approaches, however, produce high dimensional and thus expensive representations for data storage and computation. Because the standard Bag of Visual Words framework disregards the relationships between the histogram bins, the model can be further enhanced by exploiting the correlations between the visual words. Even the improved visual features will find it hard to capture an abstract semantic meaning of some queries, e.g. straight road in the USA. Textual features, on the other hand, would struggle with such queries as church with more than two towers as in many cases the information about the number of towers would be missing. Thus, both visual and textual features represent complementary yet correlated aspects of the same information object, an image. Existing hybrid approaches for the combination of visual and textual features do not take these inherent relationships into account and thus the combinations performance improvement is limited. Visual and textual features can be also combined in the context of relevance feedback. The relevance feedback can help us narrow down and correct the search. The feedback mechanism would produce subsets of visual query and feedback representations as well as subsets of textual query and textual feedback representations. A meaningful feature combination in the context of relevance feedback should take the inherent inter (visual-textual) and intra (visual-visual, textualtextual) relationships into account. In this work, we propose a principled framework for the semantic gap reduction in large scale generic image retrieval. The proposed framework comprises development and enhancement of novel visual features, a hybrid model for the visual and textual features combination, and a hybrid model for the combination of features in the context of relevance feedback, with both fixed and adaptive weighting schemes (importance of a query and its context). Apart from the experimental evaluation of our models, theoretical validations of some interesting discoveries on feature fusion strategies were also performed. The proposed models were incorporated into our prototype system with an interactive user interface

Open Access Institutional Repository at Robert Gordon University