32,604 research outputs found
Scene extraction in motion pictures
This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method
The TREC-2002 video track report
TREC-2002 saw the second running of the Video Track, the goal of which was to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The track used 73.3 hours of publicly available digital video (in MPEG-1/VCD format) downloaded by the participants directly from the Internet Archive (Prelinger Archives) (internetarchive, 2002) and some from the Open
Video Project (Marchionini, 2001). The material comprised advertising, educational, industrial, and amateur films produced between the 1930's and the 1970's by corporations, nonprofit organizations, trade associations, community and interest groups, educational institutions, and individuals. 17 teams representing 5 companies and 12 universities - 4 from Asia, 9 from Europe, and 4 from the US - participated in one or more of three tasks in the 2001 video track: shot boundary determination, feature extraction, and search (manual or interactive). Results were scored by NIST using manually created truth data for shot boundary determination and manual assessment of feature extraction and search results. This paper is an introduction to, and an overview
of, the track framework - the tasks, data, and measures - the approaches taken by the participating groups, the results, and issues regrading the evaluation. For detailed information about the approaches and results, the reader should see the various site reports in the final workshop proceedings
Using Graphics Processor Units (GPUs) for automatic video structuring
The rapid pace of development of Graphic Processor Units (GPUs) in recent years in terms of performance and programmability has attracted the attention of those seeking to leverage alternative architectures for better performance than that which commodity CPUs can provide. In this paper, the potential of the GPU in automatically structuring video is examined, specifically in shot boundary detection and representative keyframe selection techniques. We first introduce the programming model of the GPU and outline the implementation of techniques for shot boundary detection and representative keyframe selection on both the CPU and GPU, using histogram comparisons. We compare the approaches and present performance results for both the CPU and GPU. Overall these results demonstrate the significant potential for the GPU in this domain
A Cosmic Watershed: the WVF Void Detection Technique
On megaparsec scales the Universe is permeated by an intricate filigree of
clusters, filaments, sheets and voids, the Cosmic Web. For the understanding of
its dynamical and hierarchical history it is crucial to identify objectively
its complex morphological components. One of the most characteristic aspects is
that of the dominant underdense Voids, the product of a hierarchical process
driven by the collapse of minor voids in addition to the merging of large ones.
In this study we present an objective void finder technique which involves a
minimum of assumptions about the scale, structure and shape of voids. Our void
finding method, the Watershed Void Finder (WVF), is based upon the Watershed
Transform, a well-known technique for the segmentation of images. Importantly,
the technique has the potential to trace the existing manifestations of a void
hierarchy. The basic watershed transform is augmented by a variety of
correction procedures to remove spurious structure resulting from sampling
noise. This study contains a detailed description of the WVF. We demonstrate
how it is able to trace and identify, relatively parameter free, voids and
their surrounding (filamentary and planar) boundaries. We test the technique on
a set of Kinematic Voronoi models, heuristic spatial models for a cellular
distribution of matter. Comparison of the WVF segmentations of low noise and
high noise Voronoi models with the quantitatively known spatial characteristics
of the intrinsic Voronoi tessellation shows that the size and shape of the
voids are succesfully retrieved. WVF manages to even reproduce the full void
size distribution function.Comment: 24 pages, 15 figures, MNRAS accepted, for full resolution, see
http://www.astro.rug.nl/~weygaert/tim1publication/watershed.pd
Activity-driven content adaptation for effective video summarisation
In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided
An audio-based sports video segmentation and event detection algorithm
In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques
Maximal adaptive-decision speedups in quantum-state readout
The average time required for high-fidelity readout of quantum states can
be significantly reduced via a real-time adaptive decision rule. An adaptive
decision rule stops the readout as soon as a desired level of confidence has
been achieved, as opposed to setting a fixed readout time . The
performance of the adaptive decision is characterized by the "adaptive-decision
speedup," . In this work, we reformulate this readout problem in terms
of the first-passage time of a particle undergoing stochastic motion. This
formalism allows us to theoretically establish the maximum achievable
adaptive-decision speedups for several physical two-state readout
implementations. We show that for two common readout schemes (the Gaussian
latching readout and a readout relying on state-dependent decay), the speedup
is bounded by and , respectively, in the limit of high single-shot
readout fidelity. We experimentally study the achievable speedup in a
real-world scenario by applying the adaptive decision rule to a readout of the
nitrogen-vacancy-center (NV-center) charge state. We find a speedup of with our experimental parameters. In addition, we propose a simple readout
scheme for which the speedup can, in principle, be increased without bound as
the fidelity is increased. Our results should lead to immediate improvements in
nanoscale magnetometry based on spin-to-charge conversion of the NV-center
spin, and provide a theoretical framework for further optimization of the
bandwidth of quantum measurements.Comment: 18 pages, 11 figures. This version is close to the published versio
- …