2 research outputs found
Query-Aware Sparse Coding for Multi-Video Summarization
Given the explosive growth of online videos, it is becoming increasingly
important to relieve the tedious work of browsing and managing the video
content of interest. Video summarization aims at providing such a technique by
transforming one or multiple videos into a compact one. However, conventional
multi-video summarization methods often fail to produce satisfying results as
they ignore the user's search intent. To this end, this paper proposes a novel
query-aware approach by formulating the multi-video summarization in a sparse
coding framework, where the web images searched by the query are taken as the
important preference information to reveal the query intent. To provide a
user-friendly summarization, this paper also develops an event-keyframe
presentation structure to present keyframes in groups of specific events
related to the query by using an unsupervised multi-graph fusion method. We
release a new public dataset named MVS1K, which contains about 1, 000 videos
from 10 queries and their video tags, manual annotations, and associated web
images. Extensive experiments on MVS1K dataset validate our approaches produce
superior objective and subjective results against several recently proposed
approaches.Comment: 10 pages, 8 figure
Efficient Detection of Points of Interest from Georeferenced Visual Content
Many people take photos and videos with smartphones and more recently with
360-degree cameras at popular places and events, and share them in social
media. Such visual content is produced in large volumes in urban areas, and it
is a source of information that online users could exploit to learn what has
got the interest of the general public on the streets of the cities where they
live or plan to visit. A key step to providing users with that information is
to identify the most popular k spots in specified areas. In this paper, we
propose a clustering and incremental sampling (C&IS) approach that trades off
accuracy of top-k results for detection speed. It uses clustering to determine
areas with high density of visual content, and incremental sampling, controlled
by stopping criteria, to limit the amount of computational work. It leverages
spatial metadata, which represent the scenes in the visual content, to rapidly
detect the hotspots, and uses a recently proposed Gaussian probability model to
describe the capture intention distribution in the query area. We evaluate the
approach with metadata, derived from a non-synthetic, user-generated dataset,
for regular mobile and 360-degree visual content. Our results show that the
C&IS approach offers 2.8x-19x reductions in processing time over an optimized
baseline, while in most cases correctly identifying 4 out of 5 top locations