43,156 research outputs found

    A novel optical flow-based representation for temporal video segmentation

    Get PDF
    Temporal video segmentation is a field of multimedia research enabling us to temporally split video data into semantically coherent scenes. In order to develop methods challenging temporal video segmentation, detecting scene boundaries is one of the more widely used approaches. As a result, representation of temporal information becomes important. We propose a new temporal video segment representation to formalize video scenes as a sequence of temporal motion change information. The idea here is that some sort of change in the optical flow character determines motion change and cuts between consecutive scenes. The problem is eventually reduced to an optical flow-based cut detection problem from which the average motion vector concept is put forward. This concept is used for proposing a pixel-based representation enriched with a novel motion-based approach. Temporal video segment points are classified as cuts and noncuts according to the proposed video segment representation. Consequently, the proposed method and representation is applied to benchmark data sets and the results are compared to other state-of-the art methods

    Automated video processing and scene understanding for intelligent video surveillance

    Get PDF
    Title from PDF of title page (University of Missouri--Columbia, viewed on December 7, 2010).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Dissertation advisor: Dr. Zhihai He.Vita.Ph. D. University of Missouri--Columbia 2010.Recent advances in key technologies have enabled the deployment of surveillance video cameras on various platforms. There is an urgent need to develop advanced computational methods and tools for automated video processing and scene understanding to support various applications. In this dissertation, we concentrate our efforts on the following four tightly coupled tasks: Aerial video registration and moving object detection. We develop a fast and reliable global camera motion estimation and video registration for aerial video surveillance. 3-D change detection from moving cameras. Based on multi-scale pattern, we construct a hierarchy of image patch descriptors and detect changes in the video scene using multi-scale information fusion. Cross-view building matching and retrieval from aerial surveillance videos. Identifying and matching buildings between camera views is our central idea. We construct a semantically rich sketch-based representation for buildings which is invariant under large scale and perspective changes. Collaborative video compression for UAV surveillance network. Based on distributed video coding, we develop a collaborative video compression scheme for a UAV surveillance network. Our extensive experimental results demonstrate that the developed suite of tools for automated video processing and scene understanding are efficient and promising for surveillance applications.Includes bibliographical reference

    Interaction between high-level and low-level image analysis for semantic video object extraction

    Get PDF
    Authors of articles published in EURASIP Journal on Advances in Signal Processing are the copyright holders of their articles and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate the article, according to the SpringerOpen copyright and license agreement (http://www.springeropen.com/authors/license)

    Self-Supervised Relative Depth Learning for Urban Scene Understanding

    Full text link
    As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, faraway mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion is a rich source of information about the world. In this work, we start by training a deep network, using fully automatic supervision, to predict relative scene depth from single images. The relative depth training images are automatically derived from simple videos of cars moving through a scene, using recent motion segmentation techniques, and no human-provided labels. This proxy task of predicting relative depth from a single image induces features in the network that result in large improvements in a set of downstream tasks including semantic segmentation, joint road segmentation and car detection, and monocular (absolute) depth estimation, over a network trained from scratch. The improvement on the semantic segmentation task is greater than those produced by any other automatically supervised methods. Moreover, for monocular depth estimation, our unsupervised pre-training method even outperforms supervised pre-training with ImageNet. In addition, we demonstrate benefits from learning to predict (unsupervised) relative depth in the specific videos associated with various downstream tasks. We adapt to the specific scenes in those tasks in an unsupervised manner to improve performance. In summary, for semantic segmentation, we present state-of-the-art results among methods that do not use supervised pre-training, and we even exceed the performance of supervised ImageNet pre-trained models for monocular depth estimation, achieving results that are comparable with state-of-the-art methods
    • …
    corecore