Search CORE

5,829 research outputs found

Background Subtraction with Real-time Semantic Segmentation

Author: Chen Xiang
Goesele Michael
Kuijper Arjan
Zeng Dongdong
Zhu Ming
Publication venue
Publication date: 12/12/2018
Field of study

Accurate and fast foreground object extraction is very important for object tracking and recognition in video surveillance. Although many background subtraction (BGS) methods have been proposed in the recent past, it is still regarded as a tough problem due to the variety of challenging situations that occur in real-world scenarios. In this paper, we explore this problem from a new perspective and propose a novel background subtraction framework with real-time semantic segmentation (RTSS). Our proposed framework consists of two components, a traditional BGS segmenter

\mathcal{B}

and a real-time semantic segmenter

\mathcal{S}

. The BGS segmenter

\mathcal{B}

aims to construct background models and segments foreground objects. The real-time semantic segmenter

\mathcal{S}

is used to refine the foreground segmentation outputs as feedbacks for improving the model updating accuracy.

\mathcal{B}

and

\mathcal{S}

work in parallel on two threads. For each input frame

I_t

, the BGS segmenter

\mathcal{B}

computes a preliminary foreground/background (FG/BG) mask

B_t

. At the same time, the real-time semantic segmenter

\mathcal{S}

extracts the object-level semantics

{S}_t

. Then, some specific rules are applied on

{B}_t

and

{S}_t

to generate the final detection

{D}_t

. Finally, the refined FG/BG mask

{D}_t

is fed back to update the background model. Comprehensive experiments evaluated on the CDnet 2014 dataset demonstrate that our proposed method achieves state-of-the-art performance among all unsupervised background subtraction methods while operating at real-time, and even performs better than some deep learning based supervised algorithms. In addition, our proposed framework is very flexible and has the potential for generalization

arXiv.org e-Print Archive

TUbiblio

Real-Time Semantic Background Subtraction

Author: Braham Marc
Cioppa Anthony
Van Droogenbroeck Marc
Publication venue
Publication date: 27/05/2020
Field of study

Semantic background subtraction SBS has been shown to improve the performance of most background subtraction algorithms by combining them with semantic information, derived from a semantic segmentation network. However, SBS requires high-quality semantic segmentation masks for all frames, which are slow to compute. In addition, most state-of-the-art background subtraction algorithms are not real-time, which makes them unsuitable for real-world applications. In this paper, we present a novel background subtraction algorithm called Real-Time Semantic Background Subtraction (denoted RT-SBS) which extends SBS for real-time constrained applications while keeping similar performances. RT-SBS effectively combines a real-time background subtraction algorithm with high-quality semantic information which can be provided at a slower pace, independently for each pixel. We show that RT-SBS coupled with ViBe sets a new state of the art for real-time background subtraction algorithms and even competes with the non real-time state-of-the-art ones. Note that we provide python CPU and GPU implementations of RT-SBS at https://github.com/cioppaanthony/rt-sbs.Comment: Accepted and Published at ICIP 202

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

BSUV-Net: a fully-convolutional neural network for background subtraction of unseen videos

Author: Ishwar Prakash
Konrad Janusz
Tezcan M Ozan
Publication venue
Publication date: 14/01/2020
Field of study

Background subtraction is a basic task in computer vision and video processing often applied as a pre-processing step for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have been proposed, however nearly all of the top-performing ones are supervised. Crucially, their success relies upon the availability of some annotated frames of the test video during training. Consequently, their performance on completely “unseen” videos is undocumented in the literature. In this work, we propose a new, supervised, background subtraction algorithm for unseen videos (BSUV-Net) based on a fully-convolutional neural network. The input to our network consists of the current frame and two background frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance of overfitting, we also introduce a new data-augmentation technique which mitigates the impact of illumination difference between the background frames and the current frame. On the CDNet-2014 dataset, BSUV-Net outperforms stateof-the-art algorithms evaluated on unseen videos in terms of several metrics including F-measure, recall and precision.Accepted manuscrip

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

A fully-convolutional neural network for background subtraction of unseen videos

Author: Ishwar Prakash
Konrad Janusz
Tezcan Mustafa Ozan
Publication venue
Publication date: 01/01/2019
Field of study

Background subtraction is a basic task in computer vision and video processing often applied as a pre-processing step for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have been proposed, however nearly all of the top-performing ones are supervised. Crucially, their success relies upon the availability of some annotated frames of the test video during training. Consequently, their performance on completely “unseen” videos is undocumented in the literature. In this work, we propose a new, supervised, backgroundsubtraction algorithm for unseen videos (BSUV-Net) based on a fully-convolutional neural network. The input to our network consists of the current frame and two background frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance of overfitting, we also introduce a new data-augmentation technique which mitigates the impact of illumination difference between the background frames and the current frame. On the CDNet-2014 dataset, BSUV-Net outperforms stateof-the-art algorithms evaluated on unseen videos in terms of several metrics including F-measure, recall and precision.Accepted manuscrip

Boston University Institutional Repository (OpenBU)

Occlusion Handling using Semantic Segmentation and Visibility-Based Rendering for Mixed Reality

Author: Fukiage Taiki
Hori Tomoki
Oishi Takeshi
Okamoto Yasuhide
Roxas Menandro
Publication venue
Publication date: 30/07/2017
Field of study

Real-time occlusion handling is a major problem in outdoor mixed reality system because it requires great computational cost mainly due to the complexity of the scene. Using only segmentation, it is difficult to accurately render a virtual object occluded by complex objects such as trees, bushes etc. In this paper, we propose a novel occlusion handling method for real-time, outdoor, and omni-directional mixed reality system using only the information from a monocular image sequence. We first present a semantic segmentation scheme for predicting the amount of visibility for different type of objects in the scene. We also simultaneously calculate a foreground probability map using depth estimation derived from optical flow. Finally, we combine the segmentation result and the probability map to render the computer generated object and the real scene using a visibility-based rendering method. Our results show great improvement in handling occlusions compared to existing blending based methods

arXiv.org e-Print Archive

Crossref

DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams

Author: Bhat M.
Olszewska Joanna Isabelle
Publication venue
Publication date
Field of study

In this paper, we propose a new software tool called DALES to extract semantic information from multi-view videos based on the analysis of their visual content. Our system is fully automatic and is well suited for multi-camera environment. Once the multi-view video sequences are loaded into DALES, our software performs the detection, counting, and segmentation of the visual objects evolving in the provided video streams. Then, these objects of interest are processed in order to be labelled, and the related frames are thus annotated with the corresponding semantic content. Moreover, a textual script is automatically generated with the video annotations. DALES system shows excellent performance in terms of accuracy and computational speed and is robustly designed to ensure view synchronization

University of Gloucestershire Research Repository