Content based video retrieval via spatial-temporal information discovery.

Abstract

Content based video retrieval (CBVR) has been strongly motivated by a variety of realworld applications. Most state-of-the-art CBVR systems are built based on Bag-of-visual- Words (BovW) framework for visual resources representation and access. The framework, however, ignores spatial and temporal information contained in videos, which plays a fundamental role in unveiling semantic meanings. The information includes not only the spatial layout of visual content on a still frame (image), but also temporal changes across the sequential frames. Specially, spatially and temporally co-occurring visual words, which are extracted under the BovW framework, often tend to collaboratively represent objects, scenes, or events in the videos. The spatial and temporal information discovery would be useful to advance the CBVR technology. In this thesis, we propose to explore and analyse the spatial and temporal information from a new perspective: i) co-occurrence of the visual words is formulated as a correlation matrix, ii) spatial proximity and temporal coherence are analytically and empirically studied to re ne this correlation. Following this, a quantitative spatial and temporal correlation (STC) model is de ned. The STC discovered from either the query example (denoted by QC) or the data collection (denoted by DC) are assumed to determine speci- city of the visual words in the retrieval model, i:e: selected Words-Of-Interest are found more important for certain topics. Based on this hypothesis, we utilized the STC matrix to establish a novel visual content similarity measurement method and a query reformulation scheme for the retrieval model. Additionally, the STC also characterizes the context of the visual words, and accordingly a STC-Based context similarity measurement is proposed to detect the synonymous visual words. The method partially solves an inherent error of visual vocabulary under the BovW framework. Systematic experimental evaluations on public TRECVID and CC WEB VIDEO video collections demonstrate that the proposed methods based on the STC can substantially improve retrieval e ectiveness of the BovW framework. The retrieval model based on STC outperforms state-of-the-art CBVR methods on the data collections without storage and computational expense. Furthermore, the rebuilt visual vocabulary in this thesis is more compact and e ective. Above methods can be incorporated together for e ective and e cient CBVR system implementation. Based on the experimental results, it is concluded that the spatial-temporal correlation e ectively approximates the semantical correlation. This discovered correlation approximation can be utilized for both visual content representation and similarity measurement, which are key issues for CBVR technology development

    Similar works