114 research outputs found

    Object detection, recognition and re-identification in video footage

    Get PDF
    There has been a significant number of security concerns in recent times; as a result, security cameras have been installed to monitor activities and to prevent crimes in most public places. These analysis are done either through video analytic or forensic analysis operations on human observations. To this end, within the research context of this thesis, a proactive machine vision based military recognition system has been developed to help monitor activities in the military environment. The proposed object detection, recognition and re-identification systems have been presented in this thesis. A novel technique for military personnel recognition is presented in this thesis. Initially the detected camouflaged personnel are segmented using a grabcut segmentation algorithm. Since in general a camouflaged personnel's uniform appears to be similar both at the top and the bottom of the body, an image patch is initially extracted from the segmented foreground image and used as the region of interest. Subsequently the colour and texture features are extracted from each patch and used for classification. A second approach for personnel recognition is proposed through the recognition of the badge on the cap of a military person. A feature matching metric based on the extracted Speed Up Robust Features (SURF) from the badge on a personnel's cap enabled the recognition of the personnel's arm of service. A state-of-the-art technique for recognising vehicle types irrespective of their view angle is also presented in this thesis. Vehicles are initially detected and segmented using a Gaussian Mixture Model (GMM) based foreground/background segmentation algorithm. A Canny Edge Detection (CED) stage, followed by morphological operations are used as pre-processing stage to help enhance foreground vehicular object detection and segmentation. Subsequently, Region, Histogram Oriented Gradient (HOG) and Local Binary Pattern (LBP) features are extracted from the refined foreground vehicle object and used as features for vehicle type recognition. Two different datasets with variant views of front/rear and angle are used and combined for testing the proposed technique. For night-time video analytics and forensics, the thesis presents a novel approach to pedestrian detection and vehicle type recognition. A novel feature acquisition technique named, CENTROG, is proposed for pedestrian detection and vehicle type recognition in this thesis. Thermal images containing pedestrians and vehicular objects are used to analyse the performance of the proposed algorithms. The video is initially segmented using a GMM based foreground object segmentation algorithm. A CED based pre-processing step is used to enhance segmentation accuracy prior using Census Transforms for initial feature extraction. HOG features are then extracted from the Census transformed images and used for detection and recognition respectively of human and vehicular objects in thermal images. Finally, a novel technique for people re-identification is proposed in this thesis based on using low-level colour features and mid-level attributes. The low-level colour histogram bin values were normalised to 0 and 1. A publicly available dataset (VIPeR) and a self constructed dataset have been used in the experiments conducted with 7 clothing attributes and low-level colour histogram features. These 7 attributes are detected using features extracted from 5 different regions of a detected human object using an SVM classifier. The low-level colour features were extracted from the regions of a detected human object. These 5 regions are obtained by human object segmentation and subsequent body part sub-division. People are re-identified by computing the Euclidean distance between a probe and the gallery image sets. The experiments conducted using SVM classifier and Euclidean distance has proven that the proposed techniques attained all of the aforementioned goals. The colour and texture features proposed for camouflage military personnel recognition surpasses the state-of-the-art methods. Similarly, experiments prove that combining features performed best when recognising vehicles in different views subsequent to initial training based on multi-views. In the same vein, the proposed CENTROG technique performed better than the state-of-the-art CENTRIST technique for both pedestrian detection and vehicle type recognition at night-time using thermal images. Finally, we show that the proposed 7 mid-level attributes and the low-level features results in improved performance accuracy for people re-identification

    Review of Person Re-identification Techniques

    Full text link
    Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lowercomputational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.Comment: Published 201

    IRIM at TRECVID 2011: Semantic Indexing and Instance Search

    Get PDF
    12 pages - TRECVID workshop notebook papers/slides available at http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.htmlInternational audienceThe IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2011 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried different fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.1387, which ranked us 5th out of 19 partic- ipants. For the instance search task, we we used both object based query and frame based query. We formu- lated the query in standard way as comparison of visual signatures either of object with parts of DB frames or as a comparison of visual signatures of query and DB frames. To produce visual signatures we also used two apporaches: the first one is the baseline Bag-Of-Visual- Words (BOVW) model based on SURF interest point descriptor; the second approach is a Bag-Of-Regions (BOR) model that extends the traditional notion of BOVW vocabulary not only to keypoint-based descrip- tors but to region based descriptors

    Human object annotation for surveillance video forensics

    Get PDF
    A system that can automatically annotate surveillance video in a manner useful for locating a person with a given description of clothing is presented. Each human is annotated based on two appearance features: primary colors of clothes and the presence of text/logos on clothes. The annotation occurs after a robust foreground extraction stage employing a modified Gaussian mixture model-based approach. The proposed pipeline consists of a preprocessing stage where color appearance of an image is improved using a color constancy algorithm. In order to annotate color information for human clothes, we use the color histogram feature in HSV space and find local maxima to extract dominant colors for different parts of a segmented human object. To detect text/logos on clothes, we begin with the extraction of connected components of enhanced horizontal, vertical, and diagonal edges in the frames. These candidate regions are classified as text or nontext on the basis of their local energy-based shape histogram features. Further, to detect humans, a novel technique has been proposed that uses contourlet transform-based local binary pattern (CLBP) features. In the proposed method, we extract the uniform direction invariant LBP feature descriptor for contourlet transformed high-pass subimages from vertical and diagonal directional bands. In the final stage, extracted CLBP descriptors are classified by a trained support vector machine. Experimental results illustrate the superiority of our method on large-scale surveillance video data

    A Latent Clothing Attribute Approach for Human Pose Estimation

    Full text link
    As a fundamental technique that concerns several vision tasks such as image parsing, action recognition and clothing retrieval, human pose estimation (HPE) has been extensively investigated in recent years. To achieve accurate and reliable estimation of the human pose, it is well-recognized that the clothing attributes are useful and should be utilized properly. Most previous approaches, however, require to manually annotate the clothing attributes and are therefore very costly. In this paper, we shall propose and explore a \emph{latent} clothing attribute approach for HPE. Unlike previous approaches, our approach models the clothing attributes as latent variables and thus requires no explicit labeling for the clothing attributes. The inference of the latent variables are accomplished by utilizing the framework of latent structured support vector machines (LSSVM). We employ the strategy of \emph{alternating direction} to train the LSSVM model: In each iteration, one kind of variables (e.g., human pose or clothing attribute) are fixed and the others are optimized. Our extensive experiments on two real-world benchmarks show the state-of-the-art performance of our proposed approach.Comment: accepted to ACCV 2014, preceding work http://arxiv.org/abs/1404.492

    Efficient Pedestrian Detection in Urban Traffic Scenes

    Get PDF
    Pedestrians are important participants in urban traffic environments, and thus act as an interesting category of objects for autonomous cars. Automatic pedestrian detection is an essential task for protecting pedestrians from collision. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance

    Pedestrian Attribute Recognition: A Survey

    Full text link
    Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes
    corecore