607 research outputs found

    Review of Person Re-identification Techniques

    Full text link
    Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lowercomputational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.Comment: Published 201

    Person Re-identification by Local Maximal Occurrence Representation and Metric Learning

    Full text link
    Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.Comment: This paper has been accepted by CVPR 2015. For source codes and extracted features please visit http://www.cbsr.ia.ac.cn/users/scliao/projects/lomo_xqda

    Object detection, recognition and re-identification in video footage

    Get PDF
    There has been a significant number of security concerns in recent times; as a result, security cameras have been installed to monitor activities and to prevent crimes in most public places. These analysis are done either through video analytic or forensic analysis operations on human observations. To this end, within the research context of this thesis, a proactive machine vision based military recognition system has been developed to help monitor activities in the military environment. The proposed object detection, recognition and re-identification systems have been presented in this thesis. A novel technique for military personnel recognition is presented in this thesis. Initially the detected camouflaged personnel are segmented using a grabcut segmentation algorithm. Since in general a camouflaged personnel's uniform appears to be similar both at the top and the bottom of the body, an image patch is initially extracted from the segmented foreground image and used as the region of interest. Subsequently the colour and texture features are extracted from each patch and used for classification. A second approach for personnel recognition is proposed through the recognition of the badge on the cap of a military person. A feature matching metric based on the extracted Speed Up Robust Features (SURF) from the badge on a personnel's cap enabled the recognition of the personnel's arm of service. A state-of-the-art technique for recognising vehicle types irrespective of their view angle is also presented in this thesis. Vehicles are initially detected and segmented using a Gaussian Mixture Model (GMM) based foreground/background segmentation algorithm. A Canny Edge Detection (CED) stage, followed by morphological operations are used as pre-processing stage to help enhance foreground vehicular object detection and segmentation. Subsequently, Region, Histogram Oriented Gradient (HOG) and Local Binary Pattern (LBP) features are extracted from the refined foreground vehicle object and used as features for vehicle type recognition. Two different datasets with variant views of front/rear and angle are used and combined for testing the proposed technique. For night-time video analytics and forensics, the thesis presents a novel approach to pedestrian detection and vehicle type recognition. A novel feature acquisition technique named, CENTROG, is proposed for pedestrian detection and vehicle type recognition in this thesis. Thermal images containing pedestrians and vehicular objects are used to analyse the performance of the proposed algorithms. The video is initially segmented using a GMM based foreground object segmentation algorithm. A CED based pre-processing step is used to enhance segmentation accuracy prior using Census Transforms for initial feature extraction. HOG features are then extracted from the Census transformed images and used for detection and recognition respectively of human and vehicular objects in thermal images. Finally, a novel technique for people re-identification is proposed in this thesis based on using low-level colour features and mid-level attributes. The low-level colour histogram bin values were normalised to 0 and 1. A publicly available dataset (VIPeR) and a self constructed dataset have been used in the experiments conducted with 7 clothing attributes and low-level colour histogram features. These 7 attributes are detected using features extracted from 5 different regions of a detected human object using an SVM classifier. The low-level colour features were extracted from the regions of a detected human object. These 5 regions are obtained by human object segmentation and subsequent body part sub-division. People are re-identified by computing the Euclidean distance between a probe and the gallery image sets. The experiments conducted using SVM classifier and Euclidean distance has proven that the proposed techniques attained all of the aforementioned goals. The colour and texture features proposed for camouflage military personnel recognition surpasses the state-of-the-art methods. Similarly, experiments prove that combining features performed best when recognising vehicles in different views subsequent to initial training based on multi-views. In the same vein, the proposed CENTROG technique performed better than the state-of-the-art CENTRIST technique for both pedestrian detection and vehicle type recognition at night-time using thermal images. Finally, we show that the proposed 7 mid-level attributes and the low-level features results in improved performance accuracy for people re-identification

    Person Re-Identification Techniques for Intelligent Video Surveillance Systems

    Get PDF
    Nowadays, intelligent video-surveillance is one of the most active research fields in com- puter vision and machine learning techniques which provides useful tools for surveillance operators and forensic video investigators. Person re-identification is among these tools; it consists of recognizing whether an individual has already been observed over a network of cameras. This tool can also be employed in various possible applications, e.g., off-line retrieval of all the video-sequences showing an individual of interest whose image is given as query, or on-line pedestrian tracking over multiple cameras. For the off-line retrieval applications, one of the goals of person re-identification systems is to support video surveillance operators and forensic investigators to find an individual of interest in videos acquired by a network of non-overlapping cameras. This is attained by sorting images of previously ob- served individuals for decreasing values of their similarity with a given probe individual. This task is typically achieved by exploiting the clothing appearance, in which a classical biometric methods like the face recognition is impeded to be practical in real-world video surveillance scenarios, because of low-quality of acquired images. Existing clothing appearance descriptors, together with their similarity measures, are mostly aimed at im- proving ranking quality. These methods usually are employed as part-based body model in order to extract image signature that might be independently treated in different body parts (e.g. torso and legs). Whereas, it is a must that a re-identification model to be robust and discriminate on individual of interest recognition, the issue of the processing time might also be crucial in terms of tackling this task in real-world scenarios. This issue can be also seen from two different point of views such as processing time to construct a model (aka descriptor generation); which usually can be done off-line, and processing time to find the correct individual from bunch of acquired video frames (aka descriptor matching); which is the real-time procedure of the re-identification systems. This thesis addresses the issue of processing time for descriptor matching, instead of im- proving ranking quality, which is also relevant in practical applications involving interaction with human operators. It will be shown how a trade-off between processing time and rank- ing quality, for any given descriptor, can be achieved through a multi-stage ranking approach inspired by multi-stage approaches to classification problems presented in pattern recogni- tion area, which it is further adapting to the re-identification task as a ranking problem. A discussion of design criteria is therefore presented as so-called multi-stage re-identification systems, and evaluation of the proposed approach carry out on three benchmark data sets, using four state-of-the-art descriptors. Additionally, by concerning to the issue of processing time, typical dimensional reduction methods are studied in terms of reducing the processing time of a descriptor where a high-dimensional feature space is generated by a specific person re-identification descriptor. An empirically experimental result is also presented in this case, and three well-known feature reduction methods are applied them on two state-of-the-art descriptors on two benchmark data sets

    Person re-Identification over distributed spaces and time

    Get PDF
    PhDReplicating the human visual system and cognitive abilities that the brain uses to process the information it receives is an area of substantial scientific interest. With the prevalence of video surveillance cameras a portion of this scientific drive has been into providing useful automated counterparts to human operators. A prominent task in visual surveillance is that of matching people between disjoint camera views, or re-identification. This allows operators to locate people of interest, to track people across cameras and can be used as a precursory step to multi-camera activity analysis. However, due to the contrasting conditions between camera views and their effects on the appearance of people re-identification is a non-trivial task. This thesis proposes solutions for reducing the visual ambiguity in observations of people between camera views This thesis first looks at a method for mitigating the effects on the appearance of people under differing lighting conditions between camera views. This thesis builds on work modelling inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited training samples. Unlike previous methods that use a mean-based representation for a set of training samples, the cumulative nature of the CBTF retains colour information from underrepresented samples in the training set. Additionally, the bi-directionality of the mapping function is explored to try and maximise re-identification accuracy by ensuring samples are accurately mapped between cameras. Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing lighting conditions within a single camera. As the CBTF requires manually labelled training samples it is limited to static lighting conditions and is less effective if the lighting changes. This Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting change over time, or rely on camera transition time information to update. By utilising contextual information drawn from the background in each camera view, an estimation of the lighting change within a single camera can be made. This background lighting model allows the mapping of colour information back to the original training conditions and thus remove the need for 3 retraining. Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous methods use a score based on a direct distance measure of set features to form a correct/incorrect match result. Rather than offering an operator a single outcome, the ranking paradigm is to give the operator a ranked list of possible matches and allow them to make the final decision. By utilising a Support Vector Machine (SVM) ranking method, a weighting on the appearance features can be learned that capitalises on the fact that not all image features are equally important to re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues by separating the training samples into smaller subsets and boosting the trained models. Finally, the thesis looks at a practical application of the ranking paradigm in a real world application. The system encompasses both the re-identification stage and the precursory extraction and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined to extract relevant information from the video, while several combinations of matching techniques are combined with temporal priors to form a more comprehensive overall matching criteria. The effectiveness of the proposed approaches is tested on datasets obtained from a variety of challenging environments including offices, apartment buildings, airports and outdoor public spaces

    Video foreground extraction for mobile camera platforms

    Get PDF
    Foreground object detection is a fundamental task in computer vision with many applications in areas such as object tracking, event identification, and behavior analysis. Most conventional foreground object detection methods work only in a stable illumination environments using fixed cameras. In real-world applications, however, it is often the case that the algorithm needs to operate under the following challenging conditions: drastic lighting changes, object shape complexity, moving cameras, low frame capture rates, and low resolution images. This thesis presents four novel approaches for foreground object detection on real-world datasets using cameras deployed on moving vehicles.The first problem addresses passenger detection and tracking tasks for public transport buses investigating the problem of changing illumination conditions and low frame capture rates. Our approach integrates a stable SIFT (Scale Invariant Feature Transform) background seat modelling method with a human shape model into a weighted Bayesian framework to detect passengers. To deal with the problem of tracking multiple targets, we employ the Reversible Jump Monte Carlo Markov Chain tracking algorithm. Using the SVM classifier, the appearance transformation models capture changes in the appearance of the foreground objects across two consecutives frames under low frame rate conditions. In the second problem, we present a system for pedestrian detection involving scenes captured by a mobile bus surveillance system. It integrates scene localization, foreground-background separation, and pedestrian detection modules into a unified detection framework. The scene localization module performs a two stage clustering of the video data.In the first stage, SIFT Homography is applied to cluster frames in terms of their structural similarity, and the second stage further clusters these aligned frames according to consistency in illumination. This produces clusters of images that are differential in viewpoint and lighting. A kernel density estimation (KDE) technique for colour and gradient is then used to construct background models for each image cluster, which is further used to detect candidate foreground pixels. Finally, using a hierarchical template matching approach, pedestrians can be detected.In addition to the second problem, we present three direct pedestrian detection methods that extend the HOG (Histogram of Oriented Gradient) techniques (Dalal and Triggs, 2005) and provide a comparative evaluation of these approaches. The three approaches include: a) a new histogram feature, that is formed by the weighted sum of both the gradient magnitude and the filter responses from a set of elongated Gaussian filters (Leung and Malik, 2001) corresponding to the quantised orientation, which we refer to as the Histogram of Oriented Gradient Banks (HOGB) approach; b) the codebook based HOG feature with branch-and-bound (efficient subwindow search) algorithm (Lampert et al., 2008) and; c) the codebook based HOGB approach.In the third problem, a unified framework that combines 3D and 2D background modelling is proposed to detect scene changes using a camera mounted on a moving vehicle. The 3D scene is first reconstructed from a set of videos taken at different times. The 3D background modelling identifies inconsistent scene structures as foreground objects. For the 2D approach, foreground objects are detected using the spatio-temporal MRF algorithm. Finally, the 3D and 2D results are combined using morphological operations.The significance of these research is that it provides basic frameworks for automatic large-scale mobile surveillance applications and facilitates many higher-level applications such as object tracking and behaviour analysis

    Visual Analysis in Traffic & Re-identification

    Get PDF
    • …
    corecore