32,604 research outputs found

    Scene extraction in motion pictures

    Full text link
    This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method

    The TREC-2002 video track report

    Get PDF
    TREC-2002 saw the second running of the Video Track, the goal of which was to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The track used 73.3 hours of publicly available digital video (in MPEG-1/VCD format) downloaded by the participants directly from the Internet Archive (Prelinger Archives) (internetarchive, 2002) and some from the Open Video Project (Marchionini, 2001). The material comprised advertising, educational, industrial, and amateur films produced between the 1930's and the 1970's by corporations, nonprofit organizations, trade associations, community and interest groups, educational institutions, and individuals. 17 teams representing 5 companies and 12 universities - 4 from Asia, 9 from Europe, and 4 from the US - participated in one or more of three tasks in the 2001 video track: shot boundary determination, feature extraction, and search (manual or interactive). Results were scored by NIST using manually created truth data for shot boundary determination and manual assessment of feature extraction and search results. This paper is an introduction to, and an overview of, the track framework - the tasks, data, and measures - the approaches taken by the participating groups, the results, and issues regrading the evaluation. For detailed information about the approaches and results, the reader should see the various site reports in the final workshop proceedings

    Using Graphics Processor Units (GPUs) for automatic video structuring

    Get PDF
    The rapid pace of development of Graphic Processor Units (GPUs) in recent years in terms of performance and programmability has attracted the attention of those seeking to leverage alternative architectures for better performance than that which commodity CPUs can provide. In this paper, the potential of the GPU in automatically structuring video is examined, specifically in shot boundary detection and representative keyframe selection techniques. We first introduce the programming model of the GPU and outline the implementation of techniques for shot boundary detection and representative keyframe selection on both the CPU and GPU, using histogram comparisons. We compare the approaches and present performance results for both the CPU and GPU. Overall these results demonstrate the significant potential for the GPU in this domain

    A Cosmic Watershed: the WVF Void Detection Technique

    Get PDF
    On megaparsec scales the Universe is permeated by an intricate filigree of clusters, filaments, sheets and voids, the Cosmic Web. For the understanding of its dynamical and hierarchical history it is crucial to identify objectively its complex morphological components. One of the most characteristic aspects is that of the dominant underdense Voids, the product of a hierarchical process driven by the collapse of minor voids in addition to the merging of large ones. In this study we present an objective void finder technique which involves a minimum of assumptions about the scale, structure and shape of voids. Our void finding method, the Watershed Void Finder (WVF), is based upon the Watershed Transform, a well-known technique for the segmentation of images. Importantly, the technique has the potential to trace the existing manifestations of a void hierarchy. The basic watershed transform is augmented by a variety of correction procedures to remove spurious structure resulting from sampling noise. This study contains a detailed description of the WVF. We demonstrate how it is able to trace and identify, relatively parameter free, voids and their surrounding (filamentary and planar) boundaries. We test the technique on a set of Kinematic Voronoi models, heuristic spatial models for a cellular distribution of matter. Comparison of the WVF segmentations of low noise and high noise Voronoi models with the quantitatively known spatial characteristics of the intrinsic Voronoi tessellation shows that the size and shape of the voids are succesfully retrieved. WVF manages to even reproduce the full void size distribution function.Comment: 24 pages, 15 figures, MNRAS accepted, for full resolution, see http://www.astro.rug.nl/~weygaert/tim1publication/watershed.pd

    Activity-driven content adaptation for effective video summarisation

    Get PDF
    In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided

    An audio-based sports video segmentation and event detection algorithm

    Get PDF
    In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques

    Maximal adaptive-decision speedups in quantum-state readout

    Full text link
    The average time TT required for high-fidelity readout of quantum states can be significantly reduced via a real-time adaptive decision rule. An adaptive decision rule stops the readout as soon as a desired level of confidence has been achieved, as opposed to setting a fixed readout time tft_f. The performance of the adaptive decision is characterized by the "adaptive-decision speedup," tf/Tt_f/T. In this work, we reformulate this readout problem in terms of the first-passage time of a particle undergoing stochastic motion. This formalism allows us to theoretically establish the maximum achievable adaptive-decision speedups for several physical two-state readout implementations. We show that for two common readout schemes (the Gaussian latching readout and a readout relying on state-dependent decay), the speedup is bounded by 44 and 22, respectively, in the limit of high single-shot readout fidelity. We experimentally study the achievable speedup in a real-world scenario by applying the adaptive decision rule to a readout of the nitrogen-vacancy-center (NV-center) charge state. We find a speedup of 2\approx 2 with our experimental parameters. In addition, we propose a simple readout scheme for which the speedup can, in principle, be increased without bound as the fidelity is increased. Our results should lead to immediate improvements in nanoscale magnetometry based on spin-to-charge conversion of the NV-center spin, and provide a theoretical framework for further optimization of the bandwidth of quantum measurements.Comment: 18 pages, 11 figures. This version is close to the published versio
    corecore