41,017 research outputs found
Improving Streaming Video Segmentation with Early and Mid-Level Visual Processing
Despite recent advances in video segmentation, many opportunities remain to
improve it using a variety of low and mid-level visual cues. We propose
improvements to the leading streaming graph-based hierarchical video
segmentation (streamGBH) method based on early and mid level visual processing.
The extensive experimental analysis of our approach validates the improvement
of hierarchical supervoxel representation by incorporating motion and color
with effective filtering. We also pose and illuminate some open questions
towards intermediate level video analysis as further extension to streamGBH. We
exploit the supervoxels as an initialization towards estimation of dominant
affine motion regions, followed by merging of such motion regions in order to
hierarchically segment a video in a novel motion-segmentation framework which
aims at subsequent applications such as foreground recognition.Comment: WACV accepted pape
Hierarchical Motion Decomposition for Dynamic Scene Parsing
Peer-reviewed paper accepted for presentation at the IEEE International Conference on Image Processing 2016International audienceA number of applications in video analysis rely on a per-frame motion segmentation of the scene as key preprocess-ing step. Moreover, different settings in video production require extracting segmentation masks of multiple moving objects and object parts in a hierarchical fashion. In order to tackle this problem, we propose to analyze and exploit the compositional structure of scene motion to provide a segmen-tation which is not purely driven by local image information. Specifically, we leverage a hierarchical motion-based partition of the scene to capture a mid-level understanding of the dynamic video content. We present experimental results showing the strengths of this approach in comparison to current video segmentation approaches
Recommended from our members
Edge-based motion segmentation
Motion segmentation is the process of dividing video frames into regions which have different motions, providing a cut-out of the moving objects. Such a segmentation is a necessary first stage in many video analysis applications, but providing an accurate, efficient motion segmentation still presents a challenge. This dissertation proposes a novel approach to motion segmentation, using the image edges in a frame. Using edges, a motion can be calculated for each object. Edges provide good motion information, and it is shown that a set of edges, labelled according to the object motion that they obey, is sufficient to completely determine the labelling of the whole frame, up to unresolvable ambiguities. The areas of the frame between edges are divided into regions, grouping together pixels of similar colour, and these regions can each be assigned to different motion layers by reference to the edges. The depth ordering of these layers can also be deduced. A Bayesian framework is presented, which determines the most likely region labelling and depth ordering, given edges labelled with their probability of obeying each of the object motions. An efficient implementation of this framework is presented, initially for segmenting two motions (foreground and background) using two frames. The ExpectationMaximisation algorithm is used to determine the two motions and calculate the label probability for each edge. The frame is then segmented into regions. The best motion labelling for these regions is determined using simulated annealing. Extensions of this simple implementation are then presented. It is demonstrated how, by tracking the edges into further frames, the statistics may be accumulated to provide an even more accurate and robust segmentation. This also allows a complete sequence to be segmented. It is then demonstrated that the framework can be extended to a larger number of motions. A new hierarchical method of initialising the Expectation-Maximisation algorithm is described, which also determines the best number of motions. These techniques have been extensively tested on thirty-four real sequences, covering a wide range of genres. The results demonstrate that the proposed edge-based approach is an accurate and efficient method of obtaining a motion segmentation
Hierarchical morphological segmentation for image sequence coding
This paper deals with a hierarchical morphological segmentation algorithm for image sequence coding. Mathematical morphology is very attractive for this purpose because it efficiently deals with geometrical features such as size, shape, contrast, or connectivity that can be considered as segmentation-oriented features. The algorithm follows a top-down procedure. It first takes into account the global information and produces a coarse segmentation, that is, with a small number of regions. Then, the segmentation quality is improved by introducing regions corresponding to more local information. The algorithm, considering sequences as being functions on a 3-D space, directly segments 3-D regions. A 3-D approach is used to get a segmentation that is stable in time and to directly solve the region correspondence problem. Each segmentation stage relies on four basic steps: simplification, marker extraction, decision, and quality estimation. The simplification removes information from the sequence to make it easier to segment. Morphological filters based on partial reconstruction are proven to be very efficient for this purpose, especially in the case of sequences. The marker extraction identifies the presence of homogeneous 3-D regions. It is based on constrained flat region labeling and morphological contrast extraction. The goal of the decision is to precisely locate the contours of regions detected by the marker extraction. This decision is performed by a modified watershed algorithm. Finally, the quality estimation concentrates on the coding residue, all the information about the 3-D regions that have not been properly segmented and therefore coded. The procedure allows the introduction of the texture and contour coding schemes within the segmentation algorithm. The coding residue is transmitted to the next segmentation stage to improve the segmentation and coding quality. Finally, segmentation and coding examples are presented to show the validity and interest of the coding approach.Peer ReviewedPostprint (published version
Point-wise mutual information-based video segmentation with high temporal consistency
In this paper, we tackle the problem of temporally consistent boundary
detection and hierarchical segmentation in videos. While finding the best
high-level reasoning of region assignments in videos is the focus of much
recent research, temporal consistency in boundary detection has so far only
rarely been tackled. We argue that temporally consistent boundaries are a key
component to temporally consistent region assignment. The proposed method is
based on the point-wise mutual information (PMI) of spatio-temporal voxels.
Temporal consistency is established by an evaluation of PMI-based point
affinities in the spectral domain over space and time. Thus, the proposed
method is independent of any optical flow computation or previously learned
motion models. The proposed low-level video segmentation method outperforms the
learning-based state of the art in terms of standard region metrics
Segmentation-based video coding:temporals links
This paper analyzes the main elements that a segmentation-based video coding approach should be based on so that it can address coding efficiency and content-based functionalities. Such elements can be defined as temporal linking and rate control. The basic features of such elements are discussed and, in both cases, a specific implementation is proposed.Peer ReviewedPostprint (published version
Multi-Cue Structure Preserving MRF for Unconstrained Video Segmentation
Video segmentation is a stepping stone to understanding video context. Video
segmentation enables one to represent a video by decomposing it into coherent
regions which comprise whole or parts of objects. However, the challenge
originates from the fact that most of the video segmentation algorithms are
based on unsupervised learning due to expensive cost of pixelwise video
annotation and intra-class variability within similar unconstrained video
classes. We propose a Markov Random Field model for unconstrained video
segmentation that relies on tight integration of multiple cues: vertices are
defined from contour based superpixels, unary potentials from temporal smooth
label likelihood and pairwise potentials from global structure of a video.
Multi-cue structure is a breakthrough to extracting coherent object regions for
unconstrained videos in absence of supervision. Our experiments on VSB100
dataset show that the proposed model significantly outperforms competing
state-of-the-art algorithms. Qualitative analysis illustrates that video
segmentation result of the proposed model is consistent with human perception
of objects
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations
This paper presents a co-clustering technique that, given a collection of
images and their hierarchies, clusters nodes from these hierarchies to obtain a
coherent multiresolution representation of the image collection. We formalize
the co-clustering as a Quadratic Semi-Assignment Problem and solve it with a
linear programming relaxation approach that makes effective use of information
from hierarchies. Initially, we address the problem of generating an optimal,
coherent partition per image and, afterwards, we extend this method to a
multiresolution framework. Finally, we particularize this framework to an
iterative multiresolution video segmentation algorithm in sequences with small
variations. We evaluate the algorithm on the Video Occlusion/Object Boundary
Detection Dataset, showing that it produces state-of-the-art results in these
scenarios.Comment: International Conference on Computer Vision (ICCV) 201
- …