2 research outputs found

    Query-Aware Sparse Coding for Multi-Video Summarization

    Full text link
    Given the explosive growth of online videos, it is becoming increasingly important to relieve the tedious work of browsing and managing the video content of interest. Video summarization aims at providing such a technique by transforming one or multiple videos into a compact one. However, conventional multi-video summarization methods often fail to produce satisfying results as they ignore the user's search intent. To this end, this paper proposes a novel query-aware approach by formulating the multi-video summarization in a sparse coding framework, where the web images searched by the query are taken as the important preference information to reveal the query intent. To provide a user-friendly summarization, this paper also develops an event-keyframe presentation structure to present keyframes in groups of specific events related to the query by using an unsupervised multi-graph fusion method. We release a new public dataset named MVS1K, which contains about 1, 000 videos from 10 queries and their video tags, manual annotations, and associated web images. Extensive experiments on MVS1K dataset validate our approaches produce superior objective and subjective results against several recently proposed approaches.Comment: 10 pages, 8 figure

    Efficient Detection of Points of Interest from Georeferenced Visual Content

    Full text link
    Many people take photos and videos with smartphones and more recently with 360-degree cameras at popular places and events, and share them in social media. Such visual content is produced in large volumes in urban areas, and it is a source of information that online users could exploit to learn what has got the interest of the general public on the streets of the cities where they live or plan to visit. A key step to providing users with that information is to identify the most popular k spots in specified areas. In this paper, we propose a clustering and incremental sampling (C&IS) approach that trades off accuracy of top-k results for detection speed. It uses clustering to determine areas with high density of visual content, and incremental sampling, controlled by stopping criteria, to limit the amount of computational work. It leverages spatial metadata, which represent the scenes in the visual content, to rapidly detect the hotspots, and uses a recently proposed Gaussian probability model to describe the capture intention distribution in the query area. We evaluate the approach with metadata, derived from a non-synthetic, user-generated dataset, for regular mobile and 360-degree visual content. Our results show that the C&IS approach offers 2.8x-19x reductions in processing time over an optimized baseline, while in most cases correctly identifying 4 out of 5 top locations
    corecore