13,272 research outputs found
Foreground Segmentation in Video Sequences with a Dynamic Background
Segmentation of a moving foreground from video sequences, in the presence of a rapidly changing background, is a difficult problem. In this paper, a novel technique for an effective segmentation of the moving foreground from video sequences with a dynamic background is developed. The segmentation problem is treated as a problem of classifying the foreground and background pixels of a video frame using the color components of the pixels as multiple features of the images. The gray levels of the pixels and the hue and saturation level components in the HSV representation of the pixels of a frame are used to form a scalar-valued feature image. This feature image incorporating multiple features of the pixels is then used to devise a simple classification scheme in the framework of a support vector machine classifier. Unlike some other data classification approaches for foreground segmentation in which a priori knowledge of the shape and size of the moving foreground is essential, in the proposed method, training samples are obtained in an automatic manner. In order to assess the effectiveness of the proposed method, the new scheme is applied to a number of video sequences with a dynamic background and the results are compared with those obtained by using other existing methods. The subjective and objective results show the superiority of the proposed scheme in providing a segmented foreground binary mask that fits more closely with the corresponding ground truth mask than those obtained by the other methods do
Segmentation of Moving Objects in Video Sequences with a Dynamic Background
Segmentation of objects from a video sequence is one of the basic operations commonly employed in vision-based systems. The quality of the segmented object has a profound effect on the performance of such systems. Segmentation of an object becomes a challenging problem in situations in which the background scenes of a video sequence are not static or contain the cast shadow of the object. This thesis is concerned with developing cost-effective methods for object segmentation from video sequences having dynamic background and cast shadows.
A novel technique for the segmentation of foreground from video sequences with a dynamic background is developed. The segmentation problem is treated as a problem of classifying the foreground and background pixels of the frames of a sequence using the pixel color components as multiple features of the images. The individual features representing the pixel gray levels, hue and saturation levels are first extracted and then linearly recombined with suitable weights to form a scalar-valued feature image. Multiple features incorporated into this scalar-valued feature image allows to devise a simple classification scheme in the framework of a support vector machine classifier. Unlike some other data classification approaches for foreground segmentation, in which a priori knowledge of the shape and size of the moving foreground is essential, in the proposed method, training samples are obtained in an automated manner. The proposed technique is shown not to be limited by the number, patterns or dimensions of the objects.
The foreground of a video frame is the region of the frame that contains the object as well as its cast shadow. A process of object segmentation generally results in segmenting the entire foreground. Thus, shadow removal from the segmented foreground is essential for object segmentation. A novel computationally efficient shadow removal technique based on multiple features is proposed. Multiple object masks, each based on a single feature, are constructed and merged together to form a single object mask. The main idea of the proposed technique is that an object pixel is less likely to be indistinguishable from the shadow pixels simultaneously with respect to all the features used.
Extensive simulations are performed by applying the proposed and some existing techniques to challenging video sequences for object segmentation and shadow removal. The subjective and objective results demonstrate the effectiveness and superiority of the schemes developed in this thesis
Recommended from our members
Human machine collaboration for foreground segmentation in images and videos
Foreground segmentation is defined as the problem of generating pixel level foreground masks for all the objects in a given image or video. Accurate foreground segmentations in images and videos have several potential applications such as improving search, training richer object detectors, image synthesis and re-targeting, scene and activity understanding, video summarization, and post-production video editing.
One effective way to solve this problem is human-machine collaboration. The main idea is to let humans guide the segmentation process through some partial supervision. As humans, we are extremely good at perception and can easily identify the foreground regions. Computers, on the other hand, lack this capability, but are extremely good at continuously processing large volumes of data at the lowest level of detail with great efficiency. Bringing these complementary strengths together can lead to systems which are accurate and cost-effective at the same time. However, in any such human-machine collaboration system, cost effectiveness and higher accuracy are competing goals. While more involvement from humans can certainly lead to higher accuracy, it also leads to increased cost both in terms of time and money. On the other hand, relying more on machines is cost-effective, but algorithms are still nowhere near human-level performance. Balancing this cost versus accuracy trade-off holds the key behind success for such a hybrid system.
In this thesis, I develop foreground segmentation algorithms which effectively and efficiently make use of human guidance for accurately segmenting foreground objects in images and videos. The algorithms developed in this thesis actively reason about the best modalities or interactions through which a user can provide guidance to the system for generating accurate segmentations. At the same time, these algorithms are also capable of prioritizing human guidance on instances where it is most needed. Finally, when structural similarity exists within data (e.g., adjacent frames in a video or similar images in a collection), the algorithms developed in this thesis are capable of propagating information from instances which have received human guidance to the ones which did not. Together, these characteristics result in a substantial savings in human annotation cost while generating high quality foreground segmentations in images and videos.
In this thesis, I consider three categories of segmentation problems all of which can greatly benefit from human-machine collaboration. First, I consider the problem of interactive image segmentation. In traditional interactive methods a human annotator provides a coarse spatial annotation (e.g., bounding box or freehand outlines) around the object of interest to obtain a segmentation. The mode of manual annotation used affects both its accuracy and ease-of-use. Whereas existing methods assume a fixed form of input no matter the image, in this thesis I propose a data-driven algorithm which learns whether an interactive segmentation method will succeed if initialized with a given annotation mode. This allows us to predict the modality that will be sufficiently strong to yield a high quality segmentation for a given image and results in large savings in annotation costs. I also propose a novel interactive segmentation algorithm called Click Carving which can accurately segment objects in images and videos using a very simple form of human interaction---point clicks. It outperforms several state-of-the-art methods and requires only a fraction of human effort in comparison.
Second, I consider the problem of segmenting images in a weakly supervised image collection. Here, we are given a collection of images all belonging to the same object category and the goal is to jointly segment the common object from all the images. For this, I develop a stagewise active approach to segmentation propagation: in each stage, the images that appear most valuable for human annotation are actively determined and labeled by human annotators, then the foreground estimates are revised in all unlabeled images accordingly. In order to identify images that, once annotated, will propagate well to other examples, I introduce an active selection procedure that operates on the joint segmentation graph over all images. It prioritizes human intervention for those images that are uncertain and influential in the graph, while also mutually diverse. Building on this, I also introduce the problem of measuring compatibility between image pairs for joint segmentation. I show that restricting the joint segmentation to only compatible image pairs results in an improved joint segmentation performance.
Finally, I propose a semi-supervised approach for segmentation propagation in video. Given human supervision in some frames of a video, this information can be propagated through time. The main challenge is that the foreground object may move quickly in the scene at the same time its appearance and shape evolves over time. To address this, I propose a higher order supervoxel label consistency potential which leverages bottom-up supervoxels to enforce long-range temporal consistency during propagation. I also introduce the notion of a generic pixel-level objectness in images and videos by training a deep neural network which uses appearance and motion to automatically assign a score to each pixel capturing its likelihood to be an "object" or "background". I show that the human guidance in the semi-supervised propagation algorithm can be further augmented with the generic pixel-objectness scores to obtain an even more accurate foreground segmentation in videos.
Throughout, I provide extensive evaluation on challenging datasets and also compare with many state-of-the-art methods and other baselines validating the strengths of proposed algorithms. The outcomes across several different experiments show that the proposed human-machine collaboration algorithms achieve accurate segmentation of foreground objects in images and videos while saving a large amount of human annotation effort.Computer Science
Click Carving: Segmenting Objects in Video with Point Clicks
We present a novel form of interactive video object segmentation where a few
clicks by the user helps the system produce a full spatio-temporal segmentation
of the object of interest. Whereas conventional interactive pipelines take the
user's initialization as a starting point, we show the value in the system
taking the lead even in initialization. In particular, for a given video frame,
the system precomputes a ranked list of thousands of possible segmentation
hypotheses (also referred to as object region proposals) using image and motion
cues. Then, the user looks at the top ranked proposals, and clicks on the
object boundary to carve away erroneous ones. This process iterates (typically
2-3 times), and each time the system revises the top ranked proposal set, until
the user is satisfied with a resulting segmentation mask. Finally, the mask is
propagated across the video to produce a spatio-temporal object tube. On three
challenging datasets, we provide extensive comparisons with both existing work
and simpler alternative methods. In all, the proposed Click Carving approach
strikes an excellent balance of accuracy and human effort. It outperforms all
similarly fast methods, and is competitive or better than those requiring 2 to
12 times the effort.Comment: A preliminary version of the material in this document was filed as
University of Texas technical report no. UT AI16-0
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
We tackle the task of semi-supervised video object segmentation, i.e.
segmenting the pixels belonging to an object in the video using the ground
truth pixel mask for the first frame. We build on the recently introduced
one-shot video object segmentation (OSVOS) approach which uses a pretrained
network and fine-tunes it on the first frame. While achieving impressive
performance, at test time OSVOS uses the fine-tuned network in unchanged form
and is not able to adapt to large changes in object appearance. To overcome
this limitation, we propose Online Adaptive Video Object Segmentation (OnAVOS)
which updates the network online using training examples selected based on the
confidence of the network and the spatial configuration. Additionally, we add a
pretraining step based on objectness, which is learned on PASCAL. Our
experiments show that both extensions are highly effective and improve the
state of the art on DAVIS to an intersection-over-union score of 85.7%.Comment: Accepted at BMVC 2017. This version contains minor changes for the
camera ready versio
- …