156 research outputs found

    Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context

    Full text link

    Novel perspectives and approaches to video summarization

    Get PDF
    The increasing volume of videos requires efficient and effective techniques to index and structure videos. Video summarization is such a technique that extracts the essential information from a video, so that tasks such as comprehension by users and video content analysis can be conducted more effectively and efficiently. The research presented in this thesis investigates three novel perspectives of the video summarization problem and provides approaches to such perspectives. Our first perspective is to employ local keypoint to perform keyframe selection. Two criteria, namely Coverage and Redundancy, are introduced to guide the keyframe selection process in order to identify those representing maximum video content and sharing minimum redundancy. To efficiently deal with long videos, a top-down strategy is proposed, which splits the summarization problem to two sub-problems: scene identification and scene summarization. Our second perspective is to formulate the task of video summarization to the problem of sparse dictionary reconstruction. Our method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L2,1 norm, such that keyframes are directly selected as a sparse dictionary that can reconstruct the video frames. In addition, a Percentage Of Reconstruction (POR) criterion is proposed to intuitively guide users in selecting an appropriate length of the summary. In addition, an L2,0 constrained sparse dictionary selection model is also proposed to further verify the effectiveness of sparse dictionary reconstruction for video summarization. Lastly, we further investigate the multi-modal perspective of multimedia content summarization and enrichment. There are abundant images and videos on the Web, so it is highly desirable to effectively organize such resources for textual content enrichment. With the support of web scale images, our proposed system, namely StoryImaging, is capable of enriching arbitrary textual stories with visual content

    High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

    Get PDF
    Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video

    Probabilistic temporal multimedia datamining

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    KS-SIFT: a keyframe extraction method based on local features

    Get PDF
    In this work we propose a new keyframe extraction method based on SIFT local features. We extracted feature vectors from a carefully selected group of frames from a vídeo shot, analyzing those vectors to eliminate near duplicate keyframes, helping to keep a compact set. Moreover, as the keyframe extraction is based on local features, it keeps frames latent semantics and, therefore, helps to keep shot representativeness. We evaluated our method in the scene segmentation context, with videos from movies domain, developing a comparative study with three state of the art approaches based on local features. The results show that our method overcomes those approaches.FAPESP (grant 2012/19025-0)CNP

    Unsupervised t-Distributed Video Hashing and Its Deep Hashing Extension

    Full text link

    Video copy detection using multiple visual cues and MPEG-7 descriptors

    Get PDF
    We propose a video copy detection framework that detects copy segments by fusing the results of three different techniques: facial shot matching, activity subsequence matching, and non-facial shot matching using low-level features. In facial shot matching part, a high-level face detector identifies facial frames/shots in a video clip. Matching faces with extended body regions gives the flexibility to discriminate the same person (e.g., an anchor man or a political leader) in different events or scenes. In activity subsequence matching part, a spatio-temporal sequence matching technique is employed to match video clips/segments that are similar in terms of activity. Lastly, the non-facial shots are matched using low-level MPEG-7 descriptors and dynamic-weighted feature similarity calculation. The proposed framework is tested on the query and reference dataset of CBCD task of TRECVID 2008. Our results are compared with the results of top-8 most successful techniques submitted to this task. Promising results are obtained in terms of both effectiveness and efficiency. © 2010 Elsevier Inc. All rights reserved

    Detection and tracking of repeated sequences in videos

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2007.Thesis (Master's) -- Bilkent University, 2007.Includes bibliographical references leaves 87-92.In this thesis, we propose a new method to search different instances of a video sequence inside a long video. The proposed method is robust to view point and illumination changes which may occur since the sequences are captured in different times with different cameras, and to the differences in the order and the number of frames in the sequences which may occur due to editing. The algorithm does not require any query to be given for searching, and finds all repeating video sequences inside a long video in a fully automatic way. First, the frames in a video are ranked according to their similarity on the distribution of salient points and colour values. Then, a tree based approach is used to seek for the repetitions of a video sequence if there is any. These repeating sequences are pruned for more accurate results in the last step. Results are provided on two full length feature movies, Run Lola Run and Groundhog Day, on commercials of TRECVID 2004 news video corpus and on dataset created for CIVR Copy Detection Showcase 2007. In these experiments, we obtain %93 precision values for CIVR2007 Copy Detection Showcase dataset and exceed %80 precision values for other sets.Can, TolgaM.S
    corecore