2,197 research outputs found

    Video object extraction in distributed surveillance systems

    Get PDF
    Recently, automated video surveillance and related video processing algorithms have received considerable attention from the research community. Challenges in video surveillance rise from noise, illumination changes, camera motion, splits and occlusions, complex human behavior, and how to manage extracted surveillance information for delivery, archiving, and retrieval: Many video surveillance systems focus on video object extraction, while few focus on both the system architecture and video object extraction. We focus on both and integrate them to produce an end-to-end system and study the challenges associated with building this system. We propose a scalable, distributed, and real-time video-surveillance system with a novel architecture, indexing, and retrieval. The system consists of three modules: video workstations for processing, control workstations for monitoring, and a server for management and archiving. The proposed system models object features as temporal Gaussians and produces: an 18 frames/second frame-rate for SIF video and static cameras, reduced network and storage usage, and precise retrieval results. It is more scalable and delivers more balanced distributed performance than recent architectures. The first stage of video processing is noise estimation. We propose a method for localizing homogeneity and estimating the additive white Gaussian noise variance, which uses spatially scattered initial seeds and utilizes particle filtering techniques to guide their spatial movement towards homogeneous locations from which the estimation is performed. The noise estimation method reduces the number of measurements required by block-based methods while achieving more accuracy. Next, we segment video objects using a background subtraction technique. We generate the background model online for static cameras using a mixture of Gaussians background maintenance approach. For moving cameras, we use a global motion estimation method offline to bring neighboring frames into the coordinate system of the current frame and we merge them to produce the background model. We track detected objects using a feature-based object tracking method with improved detection and correction of occlusion and split. We detect occlusion and split through the identification of sudden variations in the spatia-temporal features of objects. To detect splits, we analyze the temporal behavior of split objects to discriminate between errors in segmentation and real separation of objects. Both objective and subjective experimental results show the ability of the proposed algorithm to detect and correct both splits and occlusions of objects. For the last stage of video processing, we propose a novel method for the detection of vandalism events which is based on a proposed definition for vandal behaviors recorded on surveillance video sequences. We monitor changes inside a restricted site containing vandalism-prone objects and declare vandalism when an object is detected as leaving the site while there is temporally consistent and significant static changes representing damage, given that the site is normally unchanged after use. The proposed method is tested on sequences showing real and simulated vandal behaviors and it achieves a detection rate of 96%. It detects different forms of vandalism such as graffiti and theft. The proposed end-ta-end video surveillance system aims at realizing the potential of video object extraction in automated surveillance and retrieval by focusing on both video object extraction and the management, delivery, and utilization of the extracted informatio

    Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB

    Full text link
    We propose a new single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera. Our approach uses novel occlusion-robust pose-maps (ORPM) which enable full body pose inference even under strong partial occlusions by other people and objects in the scene. ORPM outputs a fixed number of maps which encode the 3D joint locations of all people in the scene. Body part associations allow us to infer 3D pose for an arbitrary number of people without explicit bounding box prediction. To train our approach we introduce MuCo-3DHP, the first large scale training data set showing real images of sophisticated multi-person interactions and occlusions. We synthesize a large corpus of multi-person images by compositing images of individual people (with ground truth from mutli-view performance capture). We evaluate our method on our new challenging 3D annotated multi-person test set MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate research in multi-person 3D pose estimation, we will make our new datasets, and associated code publicly available for research purposes.Comment: International Conference on 3D Vision (3DV), 201

    Algorithms for trajectory integration in multiple views

    Get PDF
    PhDThis thesis addresses the problem of deriving a coherent and accurate localization of moving objects from partial visual information when data are generated by cameras placed in di erent view angles with respect to the scene. The framework is built around applications of scene monitoring with multiple cameras. Firstly, we demonstrate how a geometric-based solution exploits the relationships between corresponding feature points across views and improves accuracy in object location. Then, we improve the estimation of objects location with geometric transformations that account for lens distortions. Additionally, we study the integration of the partial visual information generated by each individual sensor and their combination into one single frame of observation that considers object association and data fusion. Our approach is fully image-based, only relies on 2D constructs and does not require any complex computation in 3D space. We exploit the continuity and coherence in objects' motion when crossing cameras' elds of view. Additionally, we work under the assumption of planar ground plane and wide baseline (i.e. cameras' viewpoints are far apart). The main contributions are: i) the development of a framework for distributed visual sensing that accounts for inaccuracies in the geometry of multiple views; ii) the reduction of trajectory mapping errors using a statistical-based homography estimation; iii) the integration of a polynomial method for correcting inaccuracies caused by the cameras' lens distortion; iv) a global trajectory reconstruction algorithm that associates and integrates fragments of trajectories generated by each camera

    CORe50: a New Dataset and Benchmark for Continuous Object Recognition

    Full text link
    Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while na\"ive incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios

    Tracking people in crowds by a part matching approach

    Full text link
    The major difficulty in human tracking is the problem raised by challenging occlusions where the target person is repeatedly and extensively occluded by either the background or another moving object. These types of occlusions may cause significant changes in the person's shape, appearance or motion, thus making the data association problem extremely difficult to solve. Unlike most of the existing methods for human tracking that handle occlusions by data association of the complete human body, in this paper we propose a method that tracks people under challenging spatial occlusions based on body part tracking. The human model we propose consists of five body parts with six degrees of freedom and each part is represented by a rich set of features. The tracking is solved using a layered data association approach, direct comparison between features (feature layer) and subsequently matching between parts of the same bodies (part layer) lead to a final decision for the global match (global layer). Experimental results have confirmed the effectiveness of the proposed method. Β© 2008 IEEE
    • …
    corecore