42,324 research outputs found

    Rain Removal in Traffic Surveillance: Does it Matter?

    Get PDF
    Varying weather conditions, including rainfall and snowfall, are generally regarded as a challenge for computer vision algorithms. One proposed solution to the challenges induced by rain and snowfall is to artificially remove the rain from images or video using rain removal algorithms. It is the promise of these algorithms that the rain-removed image frames will improve the performance of subsequent segmentation and tracking algorithms. However, rain removal algorithms are typically evaluated on their ability to remove synthetic rain on a small subset of images. Currently, their behavior is unknown on real-world videos when integrated with a typical computer vision pipeline. In this paper, we review the existing rain removal algorithms and propose a new dataset that consists of 22 traffic surveillance sequences under a broad variety of weather conditions that all include either rain or snowfall. We propose a new evaluation protocol that evaluates the rain removal algorithms on their ability to improve the performance of subsequent segmentation, instance segmentation, and feature tracking algorithms under rain and snow. If successful, the de-rained frames of a rain removal algorithm should improve segmentation performance and increase the number of accurately tracked features. The results show that a recent single-frame-based rain removal algorithm increases the segmentation performance by 19.7% on our proposed dataset, but it eventually decreases the feature tracking performance and showed mixed results with recent instance segmentation methods. However, the best video-based rain removal algorithm improves the feature tracking accuracy by 7.72%.Comment: Published in IEEE Transactions on Intelligent Transportation System

    Online real-time crowd behavior detection in video sequences

    Get PDF
    Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach

    Lucid Data Dreaming for Video Object Segmentation

    Full text link
    Convolutional networks reach top quality in pixel-level video object segmentation but require a large amount of training data (1k~100k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x~1000x less annotated data than competing methods. Our approach is suitable for both single and multiple object segmentation. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the video object segmentation task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general "objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV

    Demo: real-time indoors people tracking in scalable camera networks

    Get PDF
    In this demo we present a people tracker in indoor environments. The tracker executes in a network of smart cameras with overlapping views. Special attention is given to real-time processing by distribution of tasks between the cameras and the fusion server. Each camera performs tasks of processing the images and tracking of people in the image plane. Instead of camera images, only metadata (a bounding box per person) are sent from each camera to the fusion server. The metadata are used on the server side to estimate the position of each person in real-world coordinates. Although the tracker is designed to suit any indoor environment, in this demo the tracker's performance is presented in a meeting scenario, where occlusions of people by other people and/or furniture are significant and occur frequently. Multiple cameras insure views from multiple angles, which keeps tracking accurate even in cases of severe occlusions in some of the views

    Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking

    Full text link
    Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 201
    • …
    corecore