24,231 research outputs found
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
Content-Based Video Retrieval in Historical Collections of the German Broadcasting Archive
The German Broadcasting Archive (DRA) maintains the cultural heritage of
radio and television broadcasts of the former German Democratic Republic (GDR).
The uniqueness and importance of the video material stimulates a large
scientific interest in the video content. In this paper, we present an
automatic video analysis and retrieval system for searching in historical
collections of GDR television recordings. It consists of video analysis
algorithms for shot boundary detection, concept classification, person
recognition, text recognition and similarity search. The performance of the
system is evaluated from a technical and an archival perspective on 2,500 hours
of GDR television recordings.Comment: TPDL 2016, Hannover, Germany. Final version is available at Springer
via DO
- …