1 research outputs found

    Visual information retrieval using annotated free-hand sketches

    No full text
    The availability of commodity camera hardware, coupled with the falling cost of bandwidth and storage, has prompted an explosive growth in visual media collections, which motivates new techniques to efficiently search this deluge of visual data. This thesis develops new scalable solutions for content based image and video retrieval (CBIR/CBVR) using free-hand user sketched queries. Compared with other query mechanisms, free-hand sketches concisely and intuitively depict object shape, colour, relative position and even scene dynamics (movement). The orthogonal expressive power of keywords and sketches is also investigated to fuse appearance and semantics for sketch based retrieval (SBR). Several contributions to SBR are made. First, we extend the Bag-of-Visual-Words (Bo VW) framework to sketch-based image retrieval (SBIR). Although BoVW is extensively used in photo-real query based image retrieval systems, it is non-trivial to be applied to SBIR as relevant spatial structure information is destroyed during indexing. We propose 'Gradient Field - Histogram of Oriented Gradients' (GP-HOG) a novel structure preserving sketch descriptor for BoVW, which we show to outperform existing popular descriptors at the tasks of SBIR and of localising the sketched object within an image. Furthermore we combine sketch with keyword retrieval enabling for the first time the scalable search of image databases using keyword-annotated (semantic) sketches. Second, we present two fast sketch-based video retrieval (SBVR) algorithms driven by storyboard sketch queries depicting both objects and their dynamics. Videos are first segmented into a spatia-temporal volume representation which is then matched with the query sketch using appearance, motion (and if available) semantic annotations of the query sketch. Third, we propose a novel probabilistic algorithm for SBVR using a Markov Random Field (MRF) to model the ambiguity of sketch during video matching. Video is represented into a graph, where each node is a spatia-temporal over-segmented super-voxel. The sketch matching problem is formulated as a graph cut optimisation that simultaneously identifies relevant video clips and localise the position of the sketched object within the clip. The proposed system is the first to combine consideration of colour, shape, motion and semantic information for SB VR. Finally, we also propose a novel semantic image segmentation algorithm that outperforms existing texton based approaches and can be of benefit in pre-processing image and video into the region based representation that underpins a number of our proposed retrieval algorithms.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore