387 research outputs found
DC-image for real time compressed video matching
This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity
Extraction of Key-Frames from an Unstable Video Feed
The APOLI project deals with Automated Power Line Inspection using Highly-automated Unmanned Aerial Systems. Beside the Real-time damage assessment by on-board high-resolution image data exploitation a postprocessing of the video data is necessary. This Master Thesis deals with the implementation of an Isolator Detector Framework and a Work ow in the Automotive Data and Time-triggered Framework(ADTF) that loads a video direct from a camera or from a storage and extracts the Key Frames which contain objects of interest. This is done by the implementation of an object detection system using C++ and the creation of ADTF Filters that perform the task of detection of the objects of interest and extract the Key Frames using a supervised learning platform. The use case is the extraction of frames from video samples that contain Images of Isolators from Power Transmission Lines
Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks
Shot boundary detection (SBD) is an important component of many video
analysis tasks, such as action recognition, video indexing, summarization and
editing. Previous work typically used a combination of low-level features like
color histograms, in conjunction with simple models such as SVMs. Instead, we
propose to learn shot detection end-to-end, from pixels to final shot
boundaries. For training such a model, we rely on our insight that all shot
boundaries are generated. Thus, we create a dataset with one million frames and
automatically generated transitions such as cuts, dissolves and fades. In order
to efficiently analyze hours of videos, we propose a Convolutional Neural
Network (CNN) which is fully convolutional in time, thus allowing to use a
large temporal context without the need to repeatedly processing frames. With
this architecture our method obtains state-of-the-art results while running at
an unprecedented speed of more than 120x real-time
Egocentric Hand Detection Via Dynamic Region Growing
Egocentric videos, which mainly record the activities carried out by the
users of the wearable cameras, have drawn much research attentions in recent
years. Due to its lengthy content, a large number of ego-related applications
have been developed to abstract the captured videos. As the users are
accustomed to interacting with the target objects using their own hands while
their hands usually appear within their visual fields during the interaction,
an egocentric hand detection step is involved in tasks like gesture
recognition, action recognition and social interaction understanding. In this
work, we propose a dynamic region growing approach for hand region detection in
egocentric videos, by jointly considering hand-related motion and egocentric
cues. We first determine seed regions that most likely belong to the hand, by
analyzing the motion patterns across successive frames. The hand regions can
then be located by extending from the seed regions, according to the scores
computed for the adjacent superpixels. These scores are derived from four
egocentric cues: contrast, location, position consistency and appearance
continuity. We discuss how to apply the proposed method in real-life scenarios,
where multiple hands irregularly appear and disappear from the videos.
Experimental results on public datasets show that the proposed method achieves
superior performance compared with the state-of-the-art methods, especially in
complicated scenarios
Extraction of Exclusive Video Content from One Shot Video
With the popularity of personal digital devices, the amount of home video data is growing explosively. Many videos may only contain a single shot and are very short and their contents are diverse yet related with few major subjects or events. Users often ne ed to maintain their own video clip collections captured at different locations and time. These unedited and unorganized videos bring difficulties to their management and manipulation. This video composition system is used to generate aesthetically enhanced long - shot videos from short video clips. Our proposed system is to extract the video contents about a specific topic and compose them into a virtual one - shot presentation. All input short video clips are pre - processed and converted as one - shot video. Video frames are detected and categorized by using transition clues like human, object. Human and object frames are separated by implementing a face detection algorithm for the input one - shot video. Viola Jones face detection algorithm is used for separating human and object frames. There are three ingredients in this algorithm, worki ng in concert to enable a fast and a ccurate detection. The integral image for feature computation, adaboost for feature selection and an attentional cascade for efficient computational resource allocation. Objects are then categorized using SIFT (Scale Invariant Feature Transform) and SURF ( Speed Up Robust Features) algorithm
An Open Source Quantitative Evaluation Framework for Automatic Video Summarization Algorithms
The creation, consumption, and manipulation of video play a central role in everyday life as the amount of video data is growing at an exponential rate. Video summarization consists on producing a condensed output from a video that allows humans to rapidly understand and browse the content of the original source. Although there are several evaluation approaches proposed in the literature, multiple challenges make the quantitative evaluation of a summarization a complex process. In this paper we present a completely open video summarization evaluation framework that is compatible with existing datasets and published results. Standard metrics are considered and a new metric that captures unbalanced-class video summarization evaluation is proposed. Two legacy datasets are integrated in a standard format. Finally, new quantitative results based on already published algorithms are presented.Sociedad Argentina de Informática e Investigación Operativ
- …