3,138 research outputs found

    Person re-Identification over distributed spaces and time

    Get PDF
    PhDReplicating the human visual system and cognitive abilities that the brain uses to process the information it receives is an area of substantial scientific interest. With the prevalence of video surveillance cameras a portion of this scientific drive has been into providing useful automated counterparts to human operators. A prominent task in visual surveillance is that of matching people between disjoint camera views, or re-identification. This allows operators to locate people of interest, to track people across cameras and can be used as a precursory step to multi-camera activity analysis. However, due to the contrasting conditions between camera views and their effects on the appearance of people re-identification is a non-trivial task. This thesis proposes solutions for reducing the visual ambiguity in observations of people between camera views This thesis first looks at a method for mitigating the effects on the appearance of people under differing lighting conditions between camera views. This thesis builds on work modelling inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited training samples. Unlike previous methods that use a mean-based representation for a set of training samples, the cumulative nature of the CBTF retains colour information from underrepresented samples in the training set. Additionally, the bi-directionality of the mapping function is explored to try and maximise re-identification accuracy by ensuring samples are accurately mapped between cameras. Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing lighting conditions within a single camera. As the CBTF requires manually labelled training samples it is limited to static lighting conditions and is less effective if the lighting changes. This Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting change over time, or rely on camera transition time information to update. By utilising contextual information drawn from the background in each camera view, an estimation of the lighting change within a single camera can be made. This background lighting model allows the mapping of colour information back to the original training conditions and thus remove the need for 3 retraining. Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous methods use a score based on a direct distance measure of set features to form a correct/incorrect match result. Rather than offering an operator a single outcome, the ranking paradigm is to give the operator a ranked list of possible matches and allow them to make the final decision. By utilising a Support Vector Machine (SVM) ranking method, a weighting on the appearance features can be learned that capitalises on the fact that not all image features are equally important to re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues by separating the training samples into smaller subsets and boosting the trained models. Finally, the thesis looks at a practical application of the ranking paradigm in a real world application. The system encompasses both the re-identification stage and the precursory extraction and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined to extract relevant information from the video, while several combinations of matching techniques are combined with temporal priors to form a more comprehensive overall matching criteria. The effectiveness of the proposed approaches is tested on datasets obtained from a variety of challenging environments including offices, apartment buildings, airports and outdoor public spaces

    Activity understanding and unusual event detection in surveillance videos

    Get PDF
    PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities of human brains onto autonomous vision systems. As video surveillance cameras become ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection in surveillance videos. Nevertheless, video content analysis in public scenes remained a formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve robust detection of unusual events, which are rare, ambiguous, and easily confused with noise. This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability of conventional activity analysis methods by exploiting multi-camera visual context and human feedback. The thesis first demonstrates the importance of learning visual context for establishing reliable reasoning on observed activity in a camera network. In the proposed approach, a new Cross Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise correlations of regional activities observed within and across multiple camera views. This thesis shows that learning time delayed pairwise activity correlations offers valuable contextual information for (1) spatial and temporal topology inference of a camera network, (2) robust person re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in contrast to conventional methods, the proposed approach does not rely on either intra-camera or inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring severe inter-object occlusions. Second, to detect global unusual event across multiple disjoint cameras, this thesis extends visual context learning from pairwise relationship to global time delayed dependency between regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is proposed to model the multi-camera activities and their dependencies. Subtle global unusual events are detected and localised using the model as context-incoherent patterns across multiple camera views. In the model, different nodes represent activities in different decomposed re3 gions from different camera views, and the directed links between nodes encoding time delayed dependencies between activities observed within and across camera views. In order to learn optimised time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach is formulated by combining both constraint-based and scored-searching based structure learning methods. Third, to cope with visual context changes over time, this two-stage structure learning approach is extended to permit tractable incremental update of both TD-PGM parameters and its structure. As opposed to most existing studies that assume static model once learned, the proposed incremental learning allows a model to adapt itself to reflect the changes in the current visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly, the incremental structure learning is achieved without either exhaustive search in a large graph structure space or storing all past observations in memory, making the proposed solution memory and time efficient. Forth, an active learning approach is presented to incorporate human feedback for on-line unusual event detection. Contrary to most existing unsupervised methods that perform passive mining for unusual events, the proposed approach automatically requests supervision for critical points to resolve ambiguities of interest, leading to more robust detection of subtle unusual events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision on-the-fly on whether to request label for each unlabelled sample observed in sequence. It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion to achieve (1) discovery of unknown event classes and (2) refinement of classification boundary. The effectiveness of the proposed approaches is validated using videos captured from busy public scenes such as underground stations and traffic intersections

    Trajectory based video analysis in multi-camera setups

    Get PDF
    PhDThis thesis presents an automated framework for activity analysis in multi-camera setups. We start with the calibration of cameras particularly without overlapping views. An algorithm is presented that exploits trajectory observations in each view and works iteratively on camera pairs. First outliers are identified and removed from observations of each camera. Next, spatio-temporal information derived from the available trajectory is used to estimate unobserved trajectory segments in areas uncovered by the cameras. The unobserved trajectory estimates are used to estimate the relative position of each camera pair, whereas the exit-entrance direction of each object is used to estimate their relative orientation. The process continues and iteratively approximates the configuration of all cameras with respect to each other. Finally, we refi ne the initial configuration estimates with bundle adjustment, based on the observed and estimated trajectory segments. For cameras with overlapping views, state-of-the-art homography based approaches are used for calibration. Next we establish object correspondence across multiple views. Our algorithm consists of three steps, namely association, fusion and linkage. For association, local trajectory pairs corresponding to the same physical object are estimated using multiple spatio-temporal features on a common ground plane. To disambiguate spurious associations, we employ a hybrid approach that utilises the matching results on the image plane and ground plane. The trajectory segments after association are fused by adaptive averaging. Trajectory linkage then integrates segments and generates a single trajectory of an object across the entire observed area. Finally, for activities analysis clustering is applied on complete trajectories. Our clustering algorithm is based on four main steps, namely the extraction of a set of representative trajectory features, non-parametric clustering, cluster merging and information fusion for the identification of normal and rare object motion patterns. First we transform the trajectories into a set of feature spaces on which Meanshift identi es the modes and the corresponding clusters. Furthermore, a merging procedure is devised to re fine these results by combining similar adjacent clusters. The fi nal common patterns are estimated by fusing the clustering results across all feature spaces. Clusters corresponding to reoccurring trajectories are considered as normal, whereas sparse trajectories are associated to abnormal and rare events. The performance of the proposed framework is evaluated on standard data-sets and compared with state-of-the-art techniques. Experimental results show that the proposed framework outperforms state-of-the-art algorithms both in terms of accuracy and robustness
    • …
    corecore