5,829 research outputs found
Background Subtraction with Real-time Semantic Segmentation
Accurate and fast foreground object extraction is very important for object
tracking and recognition in video surveillance. Although many background
subtraction (BGS) methods have been proposed in the recent past, it is still
regarded as a tough problem due to the variety of challenging situations that
occur in real-world scenarios. In this paper, we explore this problem from a
new perspective and propose a novel background subtraction framework with
real-time semantic segmentation (RTSS). Our proposed framework consists of two
components, a traditional BGS segmenter and a real-time semantic
segmenter . The BGS segmenter aims to construct
background models and segments foreground objects. The real-time semantic
segmenter is used to refine the foreground segmentation outputs
as feedbacks for improving the model updating accuracy. and
work in parallel on two threads. For each input frame , the
BGS segmenter computes a preliminary foreground/background
(FG/BG) mask . At the same time, the real-time semantic segmenter
extracts the object-level semantics . Then, some specific
rules are applied on and to generate the final detection
. Finally, the refined FG/BG mask is fed back to update the
background model. Comprehensive experiments evaluated on the CDnet 2014 dataset
demonstrate that our proposed method achieves state-of-the-art performance
among all unsupervised background subtraction methods while operating at
real-time, and even performs better than some deep learning based supervised
algorithms. In addition, our proposed framework is very flexible and has the
potential for generalization
Real-Time Semantic Background Subtraction
Semantic background subtraction SBS has been shown to improve the performance
of most background subtraction algorithms by combining them with semantic
information, derived from a semantic segmentation network. However, SBS
requires high-quality semantic segmentation masks for all frames, which are
slow to compute. In addition, most state-of-the-art background subtraction
algorithms are not real-time, which makes them unsuitable for real-world
applications. In this paper, we present a novel background subtraction
algorithm called Real-Time Semantic Background Subtraction (denoted RT-SBS)
which extends SBS for real-time constrained applications while keeping similar
performances. RT-SBS effectively combines a real-time background subtraction
algorithm with high-quality semantic information which can be provided at a
slower pace, independently for each pixel. We show that RT-SBS coupled with
ViBe sets a new state of the art for real-time background subtraction
algorithms and even competes with the non real-time state-of-the-art ones. Note
that we provide python CPU and GPU implementations of RT-SBS at
https://github.com/cioppaanthony/rt-sbs.Comment: Accepted and Published at ICIP 202
BSUV-Net: a fully-convolutional neural network for background subtraction of unseen videos
Background subtraction is a basic task in computer vision and video processing often applied as a pre-processing step for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have been proposed, however nearly all of the top-performing ones are supervised. Crucially, their success relies upon the availability of some annotated frames of the test video during training. Consequently, their performance on completely “unseen” videos is undocumented in the literature. In this work, we propose a new, supervised, background subtraction algorithm for unseen videos (BSUV-Net) based on a fully-convolutional neural network. The input to our network consists of the current frame and two background frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance of overfitting, we also introduce a new data-augmentation technique which mitigates the impact of illumination difference between the background frames and the current frame. On the CDNet-2014 dataset, BSUV-Net outperforms stateof-the-art algorithms evaluated on unseen videos in terms of several metrics including F-measure, recall and precision.Accepted manuscrip
A fully-convolutional neural network for background subtraction of unseen videos
Background subtraction is a basic task in computer vision
and video processing often applied as a pre-processing step
for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have
been proposed, however nearly all of the top-performing
ones are supervised. Crucially, their success relies upon
the availability of some annotated frames of the test video
during training. Consequently, their performance on completely “unseen” videos is undocumented in the literature.
In this work, we propose a new, supervised, backgroundsubtraction algorithm for unseen videos (BSUV-Net) based
on a fully-convolutional neural network. The input to our
network consists of the current frame and two background
frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance
of overfitting, we also introduce a new data-augmentation
technique which mitigates the impact of illumination difference between the background frames and the current frame.
On the CDNet-2014 dataset, BSUV-Net outperforms stateof-the-art algorithms evaluated on unseen videos in terms of
several metrics including F-measure, recall and precision.Accepted manuscrip
Occlusion Handling using Semantic Segmentation and Visibility-Based Rendering for Mixed Reality
Real-time occlusion handling is a major problem in outdoor mixed reality
system because it requires great computational cost mainly due to the
complexity of the scene. Using only segmentation, it is difficult to accurately
render a virtual object occluded by complex objects such as trees, bushes etc.
In this paper, we propose a novel occlusion handling method for real-time,
outdoor, and omni-directional mixed reality system using only the information
from a monocular image sequence. We first present a semantic segmentation
scheme for predicting the amount of visibility for different type of objects in
the scene. We also simultaneously calculate a foreground probability map using
depth estimation derived from optical flow. Finally, we combine the
segmentation result and the probability map to render the computer generated
object and the real scene using a visibility-based rendering method. Our
results show great improvement in handling occlusions compared to existing
blending based methods
DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams
In this paper, we propose a new software tool called DALES to extract semantic information
from multi-view videos based on the analysis of their visual content. Our system is fully automatic
and is well suited for multi-camera environment. Once the multi-view video sequences are
loaded into DALES, our software performs the detection, counting, and segmentation of the visual
objects evolving in the provided video streams. Then, these objects of interest are processed
in order to be labelled, and the related frames are thus annotated with the corresponding semantic
content. Moreover, a textual script is automatically generated with the video annotations.
DALES system shows excellent performance in terms of accuracy and computational speed and
is robustly designed to ensure view synchronization
- …