70 research outputs found

    Novel perspectives and approaches to video summarization

    Get PDF
    The increasing volume of videos requires efficient and effective techniques to index and structure videos. Video summarization is such a technique that extracts the essential information from a video, so that tasks such as comprehension by users and video content analysis can be conducted more effectively and efficiently. The research presented in this thesis investigates three novel perspectives of the video summarization problem and provides approaches to such perspectives. Our first perspective is to employ local keypoint to perform keyframe selection. Two criteria, namely Coverage and Redundancy, are introduced to guide the keyframe selection process in order to identify those representing maximum video content and sharing minimum redundancy. To efficiently deal with long videos, a top-down strategy is proposed, which splits the summarization problem to two sub-problems: scene identification and scene summarization. Our second perspective is to formulate the task of video summarization to the problem of sparse dictionary reconstruction. Our method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L2,1 norm, such that keyframes are directly selected as a sparse dictionary that can reconstruct the video frames. In addition, a Percentage Of Reconstruction (POR) criterion is proposed to intuitively guide users in selecting an appropriate length of the summary. In addition, an L2,0 constrained sparse dictionary selection model is also proposed to further verify the effectiveness of sparse dictionary reconstruction for video summarization. Lastly, we further investigate the multi-modal perspective of multimedia content summarization and enrichment. There are abundant images and videos on the Web, so it is highly desirable to effectively organize such resources for textual content enrichment. With the support of web scale images, our proposed system, namely StoryImaging, is capable of enriching arbitrary textual stories with visual content

    Large-scale interactive exploratory visual search

    Get PDF
    Large scale visual search has been one of the challenging issues in the era of big data. It demands techniques that are not only highly effective and efficient but also allow users conveniently express their information needs and refine their intents. In this thesis, we focus on developing an exploratory framework for large scale visual search. We also develop a number of enabling techniques in this thesis, including compact visual content representation for scalable search, near duplicate video shot detection, and action based event detection. We propose a novel scheme for extremely low bit rate visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. Compact representation of video data is achieved through identifying keyframes of a video which can also help users comprehend visual content efficiently. We propose a novel Bag-of-Importance model for static video summarization. Near duplicate detection is one of the key issues for large scale visual search, since there exist a large number nearly identical images and videos. We propose an improved near-duplicate video shot detection approach for more effective shot representation. Event detection has been one of the solutions for bridging the semantic gap in visual search. We particular focus on human action centred event detection. We propose an enhanced sparse coding scheme to model human actions. Our proposed approach is able to significantly reduce computational cost while achieving recognition accuracy highly comparable to the state-of-the-art methods. At last, we propose an integrated solution for addressing the prime challenges raised from large-scale interactive visual search. The proposed system is also one of the first attempts for exploratory visual search. It provides users more robust results to satisfy their exploring experiences

    Finding Semantically Related Videos in Closed Collections

    Get PDF
    Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning

    Detection and tracking of repeated sequences in videos

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2007.Thesis (Master's) -- Bilkent University, 2007.Includes bibliographical references leaves 87-92.In this thesis, we propose a new method to search different instances of a video sequence inside a long video. The proposed method is robust to view point and illumination changes which may occur since the sequences are captured in different times with different cameras, and to the differences in the order and the number of frames in the sequences which may occur due to editing. The algorithm does not require any query to be given for searching, and finds all repeating video sequences inside a long video in a fully automatic way. First, the frames in a video are ranked according to their similarity on the distribution of salient points and colour values. Then, a tree based approach is used to seek for the repetitions of a video sequence if there is any. These repeating sequences are pruned for more accurate results in the last step. Results are provided on two full length feature movies, Run Lola Run and Groundhog Day, on commercials of TRECVID 2004 news video corpus and on dataset created for CIVR Copy Detection Showcase 2007. In these experiments, we obtain %93 precision values for CIVR2007 Copy Detection Showcase dataset and exceed %80 precision values for other sets.Can, TolgaM.S

    Large Scale Pattern Detection in Videos and Images from the Wild

    Get PDF
    PhDPattern detection is a well-studied area of computer vision, but still current methods are unstable in images of poor quality. This thesis describes improvements over contemporary methods in the fast detection of unseen patterns in a large corpus of videos that vary tremendously in colour and texture definition, captured “in the wild” by mobile devices and surveillance cameras. We focus on three key areas of this broad subject; First, we identify consistency weaknesses in existing techniques of processing an image and it’s horizontally reflected (mirror) image. This is important in police investigations where subjects change their appearance to try to avoid recognition, and we propose that invariance to horizontal reflection should be more widely considered in image description and recognition tasks too. We observe online Deep Learning system behaviours in this respect, and provide a comprehensive assessment of 10 popular low level feature detectors. Second, we develop simple and fast algorithms that combine to provide memory- and processing-efficient feature matching. These involve static scene elimination in the presence of noise and on-screen time indicators, a blur-sensitive feature detection that finds a greater number of corresponding features in images of varying sharpness, and a combinatorial texture and colour feature matching algorithm that matches features when either attribute may be poorly defined. A comprehensive evaluation is given, showing some improvements over existing feature correspondence methods. Finally, we study random decision forests for pattern detection. A new method of indexing patterns in video sequences is devised and evaluated. We automatically label positive and negative image training data, reducing a task of unsupervised learning to one of supervised learning, and devise a node split function that is invariant to mirror reflection and rotation through 90 degree angles. A high dimensional vote accumulator encodes the hypothesis support, yielding implicit back-projection for pattern detection.European Union’s Seventh Framework Programme, specific topic “framework and tools for (semi-) automated exploitation of massive amounts of digital data for forensic purposes”, under grant agreement number 607480 (LASIE IP project)

    Organising and structuring a visual diary using visual interest point detectors

    Get PDF
    As wearable cameras become more popular, researchers are increasingly focusing on novel applications to manage the large volume of data these devices produce. One such application is the construction of a Visual Diary from an individual’s photographs. Microsoft’s SenseCam, a device designed to passively record a Visual Diary and cover a typical day of the user wearing the camera, is an example of one such device. The vast quantity of images generated by these devices means that the management and organisation of these collections is not a trivial matter. We believe wearable cameras, such as SenseCam, will become more popular in the future and the management of the volume of data generated by these devices is a key issue. Although there is a significant volume of work in the literature in the object detection and recognition and scene classification fields, there is little work in the area of setting detection. Furthermore, few authors have examined the issues involved in analysing extremely large image collections (like a Visual Diary) gathered over a long period of time. An algorithm developed for setting detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. We present a number of approaches to setting detection based on the extraction of visual interest point detectors from the images. We also analyse the performance of two of the most popular descriptors - Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF).We present an implementation of a Visual Diary application and evaluate its performance via a series of user experiments. Finally, we also outline some techniques to allow the Visual Diary to automatically detect new settings, to scale as the image collection continues to grow substantially over time, and to allow the user to generate a personalised summary of their data

    Detection of near-duplicates in large image collections

    Get PDF
    The vast numbers of images on the Web include many duplicates, and an even larger number of near-duplicate variants derived from the same original. These include thumbnails stored by search engines, copies shared by various news portals, and images that appear on multiple web sites, legitimately or otherwise. Such near-duplicates appear in the results of many web image searches, and constitute redundancy, and may also represent infringements of copyright. Digital images can be easily altered through simple digital manipulation such as conversion to grey-scale, colour balance change, rescaling, rotation, and cropping. Any of these operations defeat simple duplicate detection methods such as bit-level hashing. The ability to detect such variants with a reasonable degree of reliability and accuracy would support reduction of redundancy in collections and in presentation of search results, and also allow detection of possible copyright violations. Some existing methods for identifying near-duplicates are derived from computer vision techniques; these have shown high effectiveness for this domain, but are computationally expensive, and therefore impractical for large image collections. Other methods address the problem using conventional CBIR approaches that are more efficient but are typically not as robust. None of the previous methods have addressed the problem in its entirety, and none have addressed the large scale near-duplicate problem on the Web; there has been no analysis of the kinds of alterations that are common on the Web, nor any or evaluation of whether real cases of near-duplication can in fact be identified. In this thesis, we analyse the different types of alterations and near-duplicates existent in a range of popular web image searches, and establish a collection and evaluation ground truth using real-world near-duplicate examples. We present a simple ranking approach to reduce the number of local-descriptors, and therefore improve the efficiency of the descriptor-based retrieval method for near-duplicate detection. The descriptor-based method has been shown to produce near-perfect detection of near-duplicates, but was previously computationally very expensive. We show that while maintaining comparable effectiveness, our method scales well for large collections of hundreds of thousands of images. We also explore a more compact indexing structure to support near duplicate image detection. We develop a method to automatically detect the pair-wise near-duplicate relationship of images without the use of a query. We adapt the hash-based probabilistic counting method --- originally used for near-duplicate text document detection --- with the local descriptors; our adaptation offers the first effective and efficient non-query-based approach to this domain. We further incorporate our pair-wise detection approach for clustering of near-duplicates. We present a clustering method specifically for near-duplicate images, where our method is arguably the first clustering method to achieve a high level of effectiveness in this domain. We also show that near-duplicates within a large collection of a million images can be effectively clustered using our approach in less than an hour using relatively modest computational resources. Overall, our proposed methods provide practical approaches to the detection and management of near-duplicate images in large collection

    Content-based video copy detection using multimodal analysis

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 67-76.Huge and increasing amount of videos broadcast through networks has raised the need of automatic video copy detection for copyright protection. Recent developments in multimedia technology introduced content-based copy detection (CBCD) as a new research field alternative to the watermarking approach for identification of video sequences. This thesis presents a multimodal framework for matching video sequences using a three-step approach: First, a high-level face detector identifies facial frames/shots in a video clip. Matching faces with extended body regions gives the flexibility to discriminate the same person (e.g., an anchor man or a political leader) in different events or scenes. In the second step, a spatiotemporal sequence matching technique is employed to match video clips/segments that are similar in terms of activity. Finally the non-facial shots are matched using low-level visual features. In addition, we utilize fuzzy logic approach for extracting color histogram to detect shot boundaries of heavily manipulated video clips. Methods for detecting noise, frame-droppings, picture-in-picture transformation windows, and extracting mask for still regions are also proposed and evaluated. The proposed method was tested on the query and reference dataset of CBCD task of TRECVID 2008. Our results were compared with the results of top-8 most successful techniques submitted to this task. Experimental results show that the proposed method performs better than most of the state-of-the-art techniques, in terms of both effectiveness and efficiency.Küçüktunç, OnurM.S

    Finding relevant videos in big data environments - how to utilize graph processing systems for video retrieval

    Get PDF
    The fast growing amount of videos in the web arises new challenges. The first is to find relevant videos for specific queries. This can be addressed by Content Based Video Retrieval (CBVR), in which the video data is used to do retrieval. A second challenge is to perform such CBVR with big amounts of data. In this work both challenges are targeted by using a distributed Big Graph Processing System for CBVR. A graph framework for CBVR is built with Apache Giraph. The system is generic in regard of the used feature set. A similarity graph is built with the chosen features. The graph system provides a insert operation for adding new videos and a query operation for retrieval. The query uses a fast fuzzy search for seeds of a personalized Pagerank, which uses the locality of the similarity graph for improving the fuzzy search. The graph system is tested with SIFT features for object recognition and matching. In the evaluation the Stanford I2V is used
    corecore