387 research outputs found

    DC-image for real time compressed video matching

    Get PDF
    This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity

    Extraction of Key-Frames from an Unstable Video Feed

    Get PDF
    The APOLI project deals with Automated Power Line Inspection using Highly-automated Unmanned Aerial Systems. Beside the Real-time damage assessment by on-board high-resolution image data exploitation a postprocessing of the video data is necessary. This Master Thesis deals with the implementation of an Isolator Detector Framework and a Work ow in the Automotive Data and Time-triggered Framework(ADTF) that loads a video direct from a camera or from a storage and extracts the Key Frames which contain objects of interest. This is done by the implementation of an object detection system using C++ and the creation of ADTF Filters that perform the task of detection of the objects of interest and extract the Key Frames using a supervised learning platform. The use case is the extraction of frames from video samples that contain Images of Isolators from Power Transmission Lines

    Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

    Full text link
    Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing. Previous work typically used a combination of low-level features like color histograms, in conjunction with simple models such as SVMs. Instead, we propose to learn shot detection end-to-end, from pixels to final shot boundaries. For training such a model, we rely on our insight that all shot boundaries are generated. Thus, we create a dataset with one million frames and automatically generated transitions such as cuts, dissolves and fades. In order to efficiently analyze hours of videos, we propose a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. With this architecture our method obtains state-of-the-art results while running at an unprecedented speed of more than 120x real-time

    Egocentric Hand Detection Via Dynamic Region Growing

    Full text link
    Egocentric videos, which mainly record the activities carried out by the users of the wearable cameras, have drawn much research attentions in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to abstract the captured videos. As the users are accustomed to interacting with the target objects using their own hands while their hands usually appear within their visual fields during the interaction, an egocentric hand detection step is involved in tasks like gesture recognition, action recognition and social interaction understanding. In this work, we propose a dynamic region growing approach for hand region detection in egocentric videos, by jointly considering hand-related motion and egocentric cues. We first determine seed regions that most likely belong to the hand, by analyzing the motion patterns across successive frames. The hand regions can then be located by extending from the seed regions, according to the scores computed for the adjacent superpixels. These scores are derived from four egocentric cues: contrast, location, position consistency and appearance continuity. We discuss how to apply the proposed method in real-life scenarios, where multiple hands irregularly appear and disappear from the videos. Experimental results on public datasets show that the proposed method achieves superior performance compared with the state-of-the-art methods, especially in complicated scenarios

    Summarization of human activity videos via low-rank approximation

    Get PDF

    Extraction of Exclusive Video Content from One Shot Video

    Get PDF
    With the popularity of personal digital devices, the amount of home video data is growing explosively. Many videos may only contain a single shot and are very short and their contents are diverse yet related with few major subjects or events. Users often ne ed to maintain their own video clip collections captured at different locations and time. These unedited and unorganized videos bring difficulties to their management and manipulation. This video composition system is used to generate aesthetically enhanced long - shot videos from short video clips. Our proposed system is to extract the video contents about a specific topic and compose them into a virtual one - shot presentation. All input short video clips are pre - processed and converted as one - shot video. Video frames are detected and categorized by using transition clues like human, object. Human and object frames are separated by implementing a face detection algorithm for the input one - shot video. Viola Jones face detection algorithm is used for separating human and object frames. There are three ingredients in this algorithm, worki ng in concert to enable a fast and a ccurate detection. The integral image for feature computation, adaboost for feature selection and an attentional cascade for efficient computational resource allocation. Objects are then categorized using SIFT (Scale Invariant Feature Transform) and SURF ( Speed Up Robust Features) algorithm

    An Open Source Quantitative Evaluation Framework for Automatic Video Summarization Algorithms

    Get PDF
    The creation, consumption, and manipulation of video play a central role in everyday life as the amount of video data is growing at an exponential rate. Video summarization consists on producing a condensed output from a video that allows humans to rapidly understand and browse the content of the original source. Although there are several evaluation approaches proposed in the literature, multiple challenges make the quantitative evaluation of a summarization a complex process. In this paper we present a completely open video summarization evaluation framework that is compatible with existing datasets and published results. Standard metrics are considered and a new metric that captures unbalanced-class video summarization evaluation is proposed. Two legacy datasets are integrated in a standard format. Finally, new quantitative results based on already published algorithms are presented.Sociedad Argentina de Informática e Investigación Operativ
    corecore