Search CORE

221 research outputs found

Efficient MRF Energy Propagation for Video Segmentation via Bilateral Filters

Author: Alatan A. Aydin
Sener Ozan
Ugur Kemal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2014
Field of study

Segmentation of an object from a video is a challenging task in multimedia applications. Depending on the application, automatic or interactive methods are desired; however, regardless of the application type, efficient computation of video object segmentation is crucial for time-critical applications; specifically, mobile and interactive applications require near real-time efficiencies. In this paper, we address the problem of video segmentation from the perspective of efficiency. We initially redefine the problem of video object segmentation as the propagation of MRF energies along the temporal domain. For this purpose, a novel and efficient method is proposed to propagate MRF energies throughout the frames via bilateral filters without using any global texture, color or shape model. Recently presented bi-exponential filter is utilized for efficiency, whereas a novel technique is also developed to dynamically solve graph-cuts for varying, non-lattice graphs in general linear filtering scenario. These improvements are experimented for both automatic and interactive video segmentation scenarios. Moreover, in addition to the efficiency, segmentation quality is also tested both quantitatively and qualitatively. Indeed, for some challenging examples, significant time efficiency is observed without loss of segmentation quality.Comment: Multimedia, IEEE Transactions on (Volume:16, Issue: 5, Aug. 2014

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement

Author: Jianbing Shen
Shao Ling
Wenguan Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2015
Field of study

We present a novel spatiotemporal saliency detection method to estimate salient regions in videos based on the gradient flow field and energy optimization. The proposed gradient flow field incorporates two distinctive features: 1) intra-frame boundary information and 2) inter-frame motion information together for indicating the salient regions. Based on the effective utilization of both intra-frame and inter-frame information in the gradient flow field, our algorithm is robust enough to estimate the object and background in complex scenes with various motion patterns and appearances. Then, we introduce local as well as global contrast saliency measures using the foreground and background information estimated from the gradient flow field. These enhanced contrast saliency cues uniformly highlight an entire object. We further propose a new energy function to encourage the spatiotemporal consistency of the output saliency maps, which is seldom explored in previous video saliency methods. The experimental results show that the proposed algorithm outperforms state-of-the-art video saliency detection methods

Northumbria Research Link

Crossref

University of East Anglia digital repository

Audiovisual Saliency Prediction in Uncategorized Video Sequences based on Audio-Video Correlation

Author: Butt Maryam Qamar
Rahman Anis Ur
Publication venue
Publication date: 07/01/2021
Field of study

Substantial research has been done in saliency modeling to develop intelligent machines that can perceive and interpret their surroundings. But existing models treat videos as merely image sequences excluding any audio information, unable to cope with inherently varying content. Based on the hypothesis that an audiovisual saliency model will be an improvement over traditional saliency models for natural uncategorized videos, this work aims to provide a generic audio/video saliency model augmenting a visual saliency map with an audio saliency map computed by synchronizing low-level audio and visual features. The proposed model was evaluated using different criteria against eye fixations data for a publicly available DIEM video dataset. The results show that the model outperformed two state-of-the-art visual saliency models.Comment: 9 pages, 2 figures, 4 table

arXiv.org e-Print Archive

IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS

Author: Dondera Radu
Publication venue
Publication date: 01/01/2013
Field of study

We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information. We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter. We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP. The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy. Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time

Digital Repository at the University of Maryland

Superpixel Segmentation of Outdoor Webcams to Infer Scene Structure

Author: Tannenbaum Rachel
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2009
Field of study

Understanding an outdoor sceneΓÇÖs 3-D structure has applications in several ∩¼üelds, including surveillance and computer graphics. Scene elementsΓÇÖ time-series brightness gives insight to their geometric orientation; and thus the 3-D structure of the overall scene. Previous works have studied the time-series brightness of individual pixels. However, there are limitations with this approach. Pixels are often quite noisy, and can require a lot of memory. This thesis explores the use of superpixels to address these issues. Superpixels, an approach to image segmentation, over-segment a scene but attempt to ensure that each segment lies on only one scene element. Applying superpixels to webcams reduces the e∩¼Çect of noise on pixelsΓÇÖ time-series brightness, and conserves memory by reducing the number of pixel ΓÇ£entitiesΓÇ¥. This thesis explores methods of solving for a superpixelΓÇÖs surface normal, and demonstrates that the time at which maximum brightness is achieved serves as a basic indicator of geographic orientation

Washington University St. Louis: Open Scholarship