472 research outputs found

    Real-time people tracking in a camera network

    Get PDF
    Visual tracking is a fundamental key to the recognition and analysis of human behaviour. In this thesis we present an approach to track several subjects using multiple cameras in real time. The tracking framework employs a numerical Bayesian estimator, also known as a particle lter, which has been developed for parallel implementation on a Graphics Processing Unit (GPU). In order to integrate multiple cameras into a single tracking unit we represent the human body by a parametric ellipsoid in a 3D world. The elliptical boundary can be projected rapidly, several hundred times per subject per frame, onto any image for comparison with the image data within a likelihood model. Adding variables to encode visibility and persistence into the state vector, we tackle the problems of distraction and short-period occlusion. However, subjects may also disappear for longer periods due to blind spots between cameras elds of view. To recognise a desired subject after such a long-period, we add coloured texture to the ellipsoid surface, which is learnt and retained during the tracking process. This texture signature improves the recall rate from 60% to 70-80% when compared to state only data association. Compared to a standard Central Processing Unit (CPU) implementation, there is a signi cant speed-up ratio

    Video object tracking : contributions to object description and performance assessment

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Universidade do Porto. Faculdade de Engenharia. 201

    Spatial and temporal background modelling of non-stationary visual scenes

    Get PDF
    PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent in recent years. Applications are to be found in medical scanning, automated manufacture, and perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic management all employ and benefit from an unprecedented quantity of video cameras for monitoring purposes. But the high cost and limited effectiveness of employing humans as the final link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques. Whilst the field of machine vision has enjoyed consistent rapid development in the last 20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner. Central to a great many vision applications is the concept of segmentation, and in particular, most practical systems perform background subtraction as one of the first stages of video processing. This involves separation of ‘interesting foreground’ from the less informative but persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and liable to be application specific. Furthermore, the background may be interpreted as including the visual appearance of normal activity of any agents present in the scene, human or otherwise. Thus a background model might be called upon to absorb lighting changes, moving trees and foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in ‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails of the computer vision field, and consequently the subject has received considerable attention. This thesis sets out to address some of the limitations of contemporary methods of background segmentation by investigating methods of inducing local mutual support amongst pixels in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm time domain, and (3) locality in the domain of cyclic repetition frequency. Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose a structure in which every image pixel bears the same relation to every other pixel. But Markov Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and 3 are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple learned local pattern hypotheses, whilst relying solely on monochrome image data. Many background models enforce temporal consistency constraints on a pixel in attempt to confirm background membership before being accepted as part of the model, and typically some control over this process is exercised by a learning rate parameter. But in busy scenes, a true background pixel may be visible for a relatively small fraction of the time and in a temporally fragmented fashion, thus hindering such background acquisition. However, support in terms of temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm background estimates which induce a similar consistency, but are considerably more robust to disturbance. A novel technique is presented here in which the short-term estimates act as ‘pre-filtered’ data from which a far more compact eigen-background may be constructed. Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions employing traffic signals are among these, yet little is to be found amongst the literature regarding the explicit modelling of such periodic processes in a scene. Previous work focussing on gait recognition has demonstrated approaches based on recurrence of self-similarity by which local periodicity may be identified. The present work harnesses and extends this method in order to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal model. The model may then be used to highlight abnormality in scene activity. Furthermore, a Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to maintain correct synchronization with scene activity in spite of noise and drift of periodicity. This thesis contends that these three approaches are all manifestations of the same broad underlying concept: local support in each of the space, time and frequency domains, and furthermore, that the support can be harnessed practically, as will be demonstrated experimentally

    Tracking moving objects in surveillance video

    Get PDF
    The thesis looks at approaches to the detection and tracking of potential objects of interest in surveillance video. The aim was to investigate and develop methods that might be suitable for eventual application through embedded software, running on a fixed-point processor, in analytics capable cameras. The work considers common approaches to object detection and representation, seeking out those that offer the necessary computational economy and the potential to be able to cope with constraints such as low frame rate due to possible limited processor time, or weak chromatic content that can occur in some typical surveillance contexts. The aim is for probabilistic tracking of objects rather than simple concatenation of frame by frame detections. This involves using recursive Bayesian estimation. The particle filter is a technique for implementing such a recursion and so it is examined in the context of both single target and combined multi-target tracking. A detailed examination of the operation of the single target tracking particle filter shows that objects can be tracked successfully using a relatively simple structured grey-scale histogram representation. It is shown that basic components of the particle filter can be simplified without loss in tracking quality. An analysis brings out the relationships between commonly used target representation distance measures and shows that in the context of the particle filter there is little to choose between them. With the correct choice of parameters, the simplest and computationally economic distance measure performs well. The work shows how to make that correct choice. Similarly, it is shown that a simple measurement likelihood function can be used in place of the more ubiquitous Gaussian. The important step of target state estimation is examined. The standard weighted mean approach is rejected, a recently proposed maximum a posteriori approach is shown to be not suitable in the context of the work, and a practical alternative is developed. Two methods are presented for tracker initialization. One of them is a simplification of an existing published method, the other is a novel approach. The aim is to detect trackable objects as they enter the scene, extract trackable features, then actively follow those features through subsequent frames. The multi-target tracking problem is then posed as one of management of multiple independent trackers

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    Smart video surveillance of pedestrians : fixed, aerial, and multi-camera methods

    Get PDF
    Crowd analysis from video footage is an active research topic in the field of computer vision. Crowds can be analaysed using different approaches, depending on their characteristics. Furthermore, analysis can be performed from footage obtained through different sources. Fixed CCTV cameras can be used, as well as cameras mounted on moving vehicles. To begin, a literature review is provided, where research works in the the fields of crowd analysis, as well as object and people tracking, occlusion handling, multi-view and sensor fusion, and multi-target tracking are analyses and compared, and their advantages and limitations highlighted. Following that, the three contributions of this thesis are presented: in a first study, crowds will be classified based on various cues (i.e. density, entropy), so that the best approaches to further analyse behaviour can be selected; then, some of the challenges of individual target tracking from aerial video footage will be tackled; finally, a study on the analysis of groups of people from multiple cameras is proposed. The analysis entails the movements of people and objects in the scene. The idea is to track as many people as possible within the crowd, and to be able to obtain knowledge from their movements, as a group, and to classify different types of scenes. An additional contribution of this thesis, are two novel datasets: on the one hand, a first set to test the proposed aerial video analysis methods; on the other, a second to validate the third study, that is, with groups of people recorded from multiple overlapping cameras performing different actions

    Unsupervised maritime target detection

    Get PDF
    The unsupervised detection of maritime targets in grey scale video is a difficult problem in maritime video surveillance. Most approaches assume that the camera is static and employ pixel-wise background modelling techniques for foreground detection; other methods rely on colour or thermal information to detect targets. These methods fail in real-world situations when the static camera assumption is violated, and colour or thermal data is unavailable. In defence and security applications, prior information and training samples of targets may be unavailable for training a classifier; the learning of a one class classifier for the background may be impossible as well. Thus, an unsupervised online approach that attempts to learn from the scene data is highly desirable. In this thesis, the characteristics of the maritime scene and the ocean texture are exploited for foreground detection. Two fast and effective methods are investigated for target detection. Firstly, online regionbased background texture models are explored for describing the appearance of the ocean. This approach avoids the need for frame registration because the model is built spatially rather than temporally. The texture appearance of the ocean is described using Local Binary Pattern (LBP) descriptors. Two models are proposed: one model is a Gaussian Mixture (GMM) and the other, referred to as a Sparse Texture Model (STM), is a set of histogram texture distributions. The foreground detections are optimized using a Graph Cut (GC) that enforces spatial coherence. Secondly, feature tracking is investigated as a means of detecting stable features in an image frame that typically correspond to maritime targets; unstable features are background regions. This approach is a Track-Before-Detect (TBD) concept and it is implemented using a hierarchical scheme for motion estimation, and matching of Scale- Invariant Feature Transform (SIFT) appearance features. The experimental results show that these approaches are feasible for foreground detection in maritime video when the camera is either static or moving. Receiver Operating Characteristic (ROC) curves were generated for five test sequences and the Area Under the ROC Curve (AUC) was analyzed for the performance of the proposed methods. The texture models, without GC optimization, achieved an AUC of 0.85 or greater on four out of the five test videos. At 50% True Positive Rate (TPR), these four test scenarios had a False Positive Rate (FPR) of less than 2%. With the GC optimization, an AUC of greater than 0.8 was achieved for all the test cases and the FPR was reduced in all cases when compared to the results without the GC. In comparison to the state of the art in background modelling for maritime scenes, our texture model methods achieved the best performance or comparable performance. The two texture models executed at a reasonable processing frame rate. The experimental results for TBD show that one may detect target features using a simple track score based on the track length. At 50% TPR a FPR of less than 4% is achieved for four out of the five test scenarios. These results are very promising for maritime target detection
    corecore