5,753 research outputs found

    CVABS: Moving Object Segmentation with Common Vector Approach for Videos

    Full text link
    Background modelling is a fundamental step for several real-time computer vision applications that requires security systems and monitoring. An accurate background model helps detecting activity of moving objects in the video. In this work, we have developed a new subspace based background modelling algorithm using the concept of Common Vector Approach with Gram-Schmidt orthogonalization. Once the background model that involves the common characteristic of different views corresponding to the same scene is acquired, a smart foreground detection and background updating procedure is applied based on dynamic control parameters. A variety of experiments is conducted on different problem types related to dynamic backgrounds. Several types of metrics are utilized as objective measures and the obtained visual results are judged subjectively. It was observed that the proposed method stands successfully for all problem types reported on CDNet2014 dataset by updating the background frames with a self-learning feedback mechanism.Comment: 12 Pages, 4 Figures, 1 Tabl

    Biologically-inspired robust motion segmentation using mutual information

    Get PDF
    This paper presents a neuroscience inspired information theoretic approach to motion segmentation. Robust motion segmentation represents a fundamental first stage in many surveillance tasks. As an alternative to widely adopted individual segmentation approaches, which are challenged in different ways by imagery exhibiting a wide range of environmental variation and irrelevant motion, this paper presents a new biologically-inspired approach which computes the multivariate mutual information between multiple complementary motion segmentation outputs. Performance evaluation across a range of datasets and against competing segmentation methods demonstrates robust performance

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Articulated human tracking and behavioural analysis in video sequences

    Get PDF
    Recently, there has been a dramatic growth of interest in the observation and tracking of human subjects through video sequences. Arguably, the principal impetus has come from the perceived demand for technological surveillance, however applications in entertainment, intelligent domiciles and medicine are also increasing. This thesis examines human articulated tracking and the classi cation of human movement, rst separately and then as a sequential process. First, this thesis considers the development and training of a 3D model of human body structure and dynamics. To process video sequences, an observation model is also designed with a multi-component likelihood based on edge, silhouette and colour. This is de ned on the articulated limbs, and visible from a single or multiple cameras, each of which may be calibrated from that sequence. Second, for behavioural analysis, we develop a methodology in which actions and activities are described by semantic labels generated from a Movement Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was developed for human tracking that allows multi-level parameter search consistent with the body structure. This tracker relies on the articulated motion prediction provided by the MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to generate a probabilistic activity description with action labels. The implemented algorithms for tracking and behavioural analysis are tested extensively and independently against ground truth on human tracking and surveillance datasets. Dynamic models are shown to predict and generate synthetic motion, while MCM recovers both periodic and non-periodic activities, de ned either on the whole body or at the limb level. Tracking results are comparable with the state of the art, however the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS

    Depth Acquisition from Digital Images

    Get PDF
    Introduction: Depth acquisition from digital images captured with a conventional camera, by analysing focus/defocus cues which are related to depth via an optical model of the camera, is a popular approach to depth-mapping a 3D scene. The majority of methods analyse the neighbourhood of a point in an image to infer its depth, which has disadvantages. A more elegant, but more difficult, solution is to evaluate only the single pixel displaying a point in order to infer its depth. This thesis investigates if a per-pixel method can be implemented without compromising accuracy and generality compared to window-based methods, whilst minimising the number of input images. Method: A geometric optical model of the camera was used to predict the relationship between focus/defocus and intensity at a pixel. Using input images with different focus settings, the relationship was used to identify the focal plane depth (i.e. focus setting) where a point is in best focus, from which the depth of the point can be resolved if camera parameters are known. Two metrics were implemented, one to identify the best focus setting for a point from the discrete input set, and one to fit a model to the input data to estimate the depth of perfect focus of the point on a continuous scale. Results: The method gave generally accurate results for a simple synthetic test scene, with a relatively low number of input images compared to similar methods. When tested on a more complex scene, the method achieved its objectives of separating complex objects from the background by depth, and produced a similar resolution of a complex 3D surface as a similar method which used significantly more input data. Conclusions: The method demonstrates that it is possible to resolve depth on a per-pixel basis without compromising accuracy and generality, and using a similar amount of input data, compared to more traditional window-based methods. In practice, the presented method offers a convenient new option for depth-based image processing applications, as the depth-map is per-pixel, but the process of capturing and preparing images for the method is not too practically cumbersome and could be easily automated unlike other per-pixel methods reviewed. However, the method still suffers from the general limitations of the depth acquisition approach using images from a conventional camera, which limits its use as a general depth acquisition solution beyond specifically depth-based image processing applications
    corecore