538 research outputs found

    Improving Multiple Object Tracking with Optical Flow and Edge Preprocessing

    Full text link
    In this paper, we present a new method for detecting road users in an urban environment which leads to an improvement in multiple object tracking. Our method takes as an input a foreground image and improves the object detection and segmentation. This new image can be used as an input to trackers that use foreground blobs from background subtraction. The first step is to create foreground images for all the frames in an urban video. Then, starting from the original blobs of the foreground image, we merge the blobs that are close to one another and that have similar optical flow. The next step is extracting the edges of the different objects to detect multiple objects that might be very close (and be merged in the same blob) and to adjust the size of the original blobs. At the same time, we use the optical flow to detect occlusion of objects that are moving in opposite directions. Finally, we make a decision on which information we keep in order to construct a new foreground image with blobs that can be used for tracking. The system is validated on four videos of an urban traffic dataset. Our method improves the recall and precision metrics for the object detection task compared to the vanilla background subtraction method and improves the CLEAR MOT metrics in the tracking tasks for most videos

    Moving Object Detection based on RGBD Information

    Get PDF
    This thesis is targeting the Moving Object Detection topic, more specifically, the Background Subtraction. In this study, we proposed two approaches using color and depth information to solve the background subtraction. The following two paragraphs will give a brief abstract for each approach. In this research study, we propose a framework for improving traditional Background Subtraction techniques. This framework is based on two data types: color and depth; it stands for obtaining preliminary results of the background segmentation using Depth and RGB channels independently, then using an algorithm to fuse them to create the final results. The experiments on the SBM-RGBD dataset using four methods: ViBe, LOBSTER, SuBSENSE, and PAWCS, proved that the proposed framework achieves an impressive performance compared to the original RGB-based techniques from the state-of-the-art. This dissertation also proposes a novel deep learning model called Deep Multi-Scale Network (DMSN) for Background Subtraction. This convolutional neural network is built to use RGB color channels and Depth maps as inputs with which it can fuse semantic and spatial information. Compared with previous Deep Learning Background Subtraction techniques that lack information due to their use of only RGB channels, our RGBD version can overcome most of the drawbacks, especially in some particular challenges. Further, this study introduces a new protocol for the SBM-RGBD dataset regarding scene-independent evaluation, dedicated to Deep Learning methods to set up a competitive platform that includes more challenging situations. The proposed method proved its efficiency in solving the background subtraction in complex problems at different levels. The experimental results verify that the proposed work outperforms the state-of-the-art on SBM-RGBD and GSM datasets

    Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

    Full text link
    3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However, recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues effectively from noisy depth data. In this paper, we propose a novel two-layer Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and jointly encodes both motion and shape cues. First, background clutter is removed by a background modeling method that is designed for depth data. Then, motion and shape cues are jointly used to generate robust and distinctive spatial-temporal interest points (STIPs): motion-based STIPs and shape-based STIPs. In the first layer of our model, a multi-scale 3D local steering kernel (M3DLSK) descriptor is proposed to describe local appearances of cuboids around motion-based STIPs. In the second layer, a spatial-temporal vector (STV) descriptor is proposed to describe the spatial-temporal distributions of shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape cues are combined to form a fused action representation. Our model performs favorably compared with common STIP detection and description methods. Thorough experiments verify that our model is effective in distinguishing similar actions and robust to background clutter, partial occlusions and pepper noise
    corecore