2,174 research outputs found

    Local quality-based matching of faces for watchlist screening applications

    Get PDF
    Video surveillance systems are often exploited by safety organizations for enhanced security and situational awareness. A key application in video surveillance is watchlist screening where target individuals are enrolled to a still-to-video Face Recognition (FR) system using single still images captured a priori under controlled conditions. Watchlist Screening is a very challenging application. Indeed, the latter must provide accurate decisions and timely recognition using limited number of reference faces for the system’s enrolment. This issue is often called the "Single Sample Per Person" (SSPP) problem. Added to that, uncontrolled factors such as variations in illumination pose and occlusion is unpreventable in real case video surveillance which causes the degradation of the FR system’s performance. Another major problem in such applications is the camera interoperability. This means that there is a huge gap between the camera used for taking the still images and the camera used for taking the video surveillance footage in terms of quality and resolution. This issue hinders the classification process then decreases the system‘s performance. Controlled and uniform lighting is indispensable for having good facial captures that contributes in the recognition performance of the system. However, in reality, facial captures are poor in illumination factor and are severely affecting the system’s performance. This is why it is important to implement a FR system which is invariant to illumination changes. The first part of this Thesis consists in investigating different illumination normalization (IN) techniques that are applied at the pre-processing level of the still-to-video FR. Afterwards IN techniques are compared to each other in order to pinpoint the most suitable technique for illumination invariance. In addition, patch-based methods for template matching extracts facial features from different regions which offers more discriminative information and deals with occlusion issues. Thus, local matching is applied for the still-to-video FR system. For that, a profound examination is needed on the manner of applying these IN techniques. Two different approaches were conducted: the global approach which consists in performing IN on the image then performs local matching and the local approach which consists in primarily dividing the images into non overlapping patches then perform on individually on each patch each IN technique. The results obtained after executing these experiments have shown that the Tan and Triggs (TT) and Multi ScaleWeberfaces are likely to offer better illumination invariance for the still-to-video FR system. In addition to that, these outperforming IN techniques applied locally on each patch have shown to improve the performance of the FR compared to the global approach. The performance of a FR system is good when the training data and the operation data are from the same distribution. Unfortunately, in still-to-video FR systems this is not satisfied. The training data are still, high quality, high resolution and frontal images. However, the testing data are video frames, low quality, low resolution and varying head pose images. Thus, the former and the latter do not have the same distribution. To address this domain shift, the second part of this Thesis consists in presenting a new technique of dynamic regional weighting exploiting unsupervised domain adaptation and contextual information based on quality. The main contribution consists in assigning dynamic weights that is specific to a camera domain.This study replaces the static and predefined manner of assigning weights. In order to assess the impact of applying local weights dynamically, results are compared to a baseline (no weights) and static weighting technique. This context based approach has proven to increase the system’s performance compared to the static weighting that is dependent on the dataset and the baseline technique which consists of having no weights. These experiments are conducted and validated using the ChokePoint Dataset. As for the performance of the still-to-video FR system, it is evaluated using performance measures, Receiver operating characteristic (ROC) curve and Precision-Recall (PR) curve analysis

    Feature fusion at the local region using localized maximum-margin learning for scene categorization

    Get PDF
    In the field of visual recognition such as scene categorization, representing an image based on the local feature (e.g., the bag-of-visual-word (BOVW) model and the bag-of-contextual-visual-word (BOCVW) model) has become popular and one of the most successful methods. In this paper, we propose a method that uses localized maximum-margin learning to fuse different types of features during the BOCVW modeling for eventual scene classification. The proposed method fuses multiple features at the stage when the best contextual visual word is selected to represent a local region (hard assignment) or the probabilities of the candidate contextual visual words used to represent the unknown region are estimated (soft assignment). The merits of the proposed method are that (1) errors caused by the ambiguity of single feature when assigning local regions to the contextual visual words can be corrected or the probabilities of the candidate contextual visual words used to represent the region can be estimated more accurately; and that (2) it offers a more flexible way in fusing these features through determining the similarity-metric locally by localized maximum-margin learning. The proposed method has been evaluated experimentally and the results indicate its effectiveness. © 2011 Elsevier Ltd All rights reserved.postprin

    Pedestrian Attribute Recognition: A Survey

    Full text link
    Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes

    Enhancing person annotation for personal photo management using content and context based technologies

    Get PDF
    Rapid technological growth and the decreasing cost of photo capture means that we are all taking more digital photographs than ever before. However, lack of technology for automatically organising personal photo archives has resulted in many users left with poorly annotated photos, causing them great frustration when such photo collections are to be browsed or searched at a later time. As a result, there has recently been significant research interest in technologies for supporting effective annotation. This thesis addresses an important sub-problem of the broad annotation problem, namely "person annotation" associated with personal digital photo management. Solutions to this problem are provided using content analysis tools in combination with context data within the experimental photo management framework, called “MediAssist”. Readily available image metadata, such as location and date/time, are captured from digital cameras with in-built GPS functionality, and thus provide knowledge about when and where the photos were taken. Such information is then used to identify the "real-world" events corresponding to certain activities in the photo capture process. The problem of enabling effective person annotation is formulated in such a way that both "within-event" and "cross-event" relationships of persons' appearances are captured. The research reported in the thesis is built upon a firm foundation of content-based analysis technologies, namely face detection, face recognition, and body-patch matching together with data fusion. Two annotation models are investigated in this thesis, namely progressive and non-progressive. The effectiveness of each model is evaluated against varying proportions of initial annotation, and the type of initial annotation based on individual and combined face, body-patch and person-context information sources. The results reported in the thesis strongly validate the use of multiple information sources for person annotation whilst emphasising the advantage of event-based photo analysis in real-life photo management systems
    corecore