538 research outputs found
Improving Multiple Object Tracking with Optical Flow and Edge Preprocessing
In this paper, we present a new method for detecting road users in an urban
environment which leads to an improvement in multiple object tracking. Our
method takes as an input a foreground image and improves the object detection
and segmentation. This new image can be used as an input to trackers that use
foreground blobs from background subtraction. The first step is to create
foreground images for all the frames in an urban video. Then, starting from the
original blobs of the foreground image, we merge the blobs that are close to
one another and that have similar optical flow. The next step is extracting the
edges of the different objects to detect multiple objects that might be very
close (and be merged in the same blob) and to adjust the size of the original
blobs. At the same time, we use the optical flow to detect occlusion of objects
that are moving in opposite directions. Finally, we make a decision on which
information we keep in order to construct a new foreground image with blobs
that can be used for tracking. The system is validated on four videos of an
urban traffic dataset. Our method improves the recall and precision metrics for
the object detection task compared to the vanilla background subtraction method
and improves the CLEAR MOT metrics in the tracking tasks for most videos
Moving Object Detection based on RGBD Information
This thesis is targeting the Moving Object Detection topic, more specifically, the
Background Subtraction. In this study, we proposed two approaches using color and
depth information to solve the background subtraction. The following two paragraphs
will give a brief abstract for each approach.
In this research study, we propose a framework for improving traditional Background
Subtraction techniques. This framework is based on two data types: color and depth; it
stands for obtaining preliminary results of the background segmentation using Depth
and RGB channels independently, then using an algorithm to fuse them to create the
final results. The experiments on the SBM-RGBD dataset using four methods: ViBe,
LOBSTER, SuBSENSE, and PAWCS, proved that the proposed framework achieves
an impressive performance compared to the original RGB-based techniques from the
state-of-the-art.
This dissertation also proposes a novel deep learning model called Deep Multi-Scale
Network (DMSN) for Background Subtraction. This convolutional neural network is
built to use RGB color channels and Depth maps as inputs with which it can fuse
semantic and spatial information. Compared with previous Deep Learning Background
Subtraction techniques that lack information due to their use of only RGB channels,
our RGBD version can overcome most of the drawbacks, especially in some particular
challenges. Further, this study introduces a new protocol for the SBM-RGBD dataset
regarding scene-independent evaluation, dedicated to Deep Learning methods to set up
a competitive platform that includes more challenging situations. The proposed method
proved its efficiency in solving the background subtraction in complex problems at
different levels. The experimental results verify that the proposed work outperforms the
state-of-the-art on SBM-RGBD and GSM datasets
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
- …