8 research outputs found

    Towards a Self-Sufficient Face Verification System

    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] The absence of a previous collaborative manual enrolment represents a significant handicap towards designing a face verification system for face re-identification purposes. In this scenario, the system must learn the target identity incrementally, using data from the video stream during the operational authentication phase. So, manual labelling cannot be assumed apart from the first few frames. On the other hand, even the most advanced methods trained on large-scale and unconstrained datasets suffer performance degradation when no adaptation to specific contexts is performed. This work proposes an adaptive face verification system, for the continuous re-identification of target identity, within the framework of incremental unsupervised learning. Our Dynamic Ensemble of SVM is capable of incorporating non-labelled information to improve the performance of any model, even when its initial performance is modest. The proposal uses the self-training approach and is compared against other classification techniques within this same approach. Results show promising behaviour in terms of both knowledge acquisition and impostor robustness.This work has received financial support from the Spanish government (project TIN2017-90135-R MINECO (FEDER)), from The Consellaría de Cultura, Educación e Ordenación Universitaria (accreditations 2016–2019, EDG431G/01 and ED431G/08), and reference competitive groups (2017–2020, and ED431C 2017/04), and from the European Regional Development Fund (ERDF). Eric López-López has received financial support from the Xunta de Galicia and the European Union (European Social Fund – ESF)Xunta de Galicia; EDG431G/01Xunta de Galicia; ED431G/08Xunta de Galicia; ED431C 2017/0

    Recent advances in video analytics for rail network surveillance for security, trespass and suicide prevention— a survey

    Railway networks systems are by design open and accessible to people, but this presents challenges in the prevention of events such as terrorism, trespass, and suicide fatalities. With the rapid advancement of machine learning, numerous computer vision methods have been developed in closed-circuit television (CCTV) surveillance systems for the purposes of managing public spaces. These methods are built based on multiple types of sensors and are designed to automatically detect static objects and unexpected events, monitor people, and prevent potential dangers. This survey focuses on recently developed CCTV surveillance methods for rail networks, discusses the challenges they face, their advantages and disadvantages and a vision for future railway surveillance systems. State-of-the-art methods for object detection and behaviour recognition applied to rail network surveillance systems are introduced, and the ethics of handling personal data and the use of automated systems are also considered

    Jedna klasa biometrijskog kriptosistema zasnovanog na konvolucionim neuronskim mrežama

    U ovoj doktorskoj disertaciji predložen je novi biometrijski kriptosistem otisaka prstiju baziran na sistemu fazi povezivanja i dubokih konvolucionih neuronskih mreža. Centralni doprinos rada predstavlja novi pristup automatskom izdvajanju obeležja fiksne dužine iz otisaka prstiju, u potpunosti zasnovanom na konvolucionim neuronskim mrežama. Predloženom kvantizacijom obeležja kodovanjem sa dva bita, biometrijski šabloni su prevedeni u binarni domen, što je omogućilo primenu XOR biometrije i razvoj biometrijskog kriptosistema koji se može koristiti za upravljanje ključevima (engl. key-release) ili za zaštitu šablona. Problem varijabilnosti biometrijskih podataka marginalizovan je primenom BCH koda za korekciju grešaka, koji radi na nivou bloka što ga čini otpornim na poznate statističke napade. Predloženi biometrijski kriptosistem sistem može upravljati dužinom ključeva od 265 bita, što zadovoljava potrebe savremenih kriptografskih sistema, uz prihvatljivu marginu EER greške od 1%. Evaluacija eksperimentalnih rezultata potvrđuje značajan napredak u odnosu na druge biometrijske kriptosisteme i sisteme za poređenje otisaka na osnovu njihove teksture

    Learning from imbalanced data in face re-identification using ensembles of classifiers

    Face re-identification is a video surveillance application where systems for video-to-video face recognition are designed using faces of individuals captured from video sequences, and seek to recognize them when they appear in archived or live videos captured over a network of video cameras. Video-based face recognition applications encounter challenges due to variations in capture conditions such as pose, illumination etc. Other challenges in this application are twofold; 1) the imbalanced data distributions between the face captures of the individuals to be re-identified and those of other individuals 2) varying degree of imbalance during operations w.r.t. the design data. Learning from imbalanced data is challenging in general due in part to the bias of performance in most two-class classification systems towards correct classification of the majority (negative, or non-target) class (face images/frames captured from the individuals in not to be re-identified) better than the minority (positive, or target) class (face images/frames captured from the individual to be re-identified) because most two-class classification systems are intended to be used under balanced data condition. Several techniques have been proposed in the literature to learn from imbalanced data that either use data-level techniques to rebalance data (by under-sampling the majority class, up-sampling the minority class, or both) for training classifiers or use algorithm-level methods to guide the learning process (with or without cost sensitive approaches) such that the bias of performance towards correct classification of the majority class is neutralized. Ensemble techniques such as Bagging and Boosting algorithms have been shown to efficiently utilize these methods to address imbalance. However, there are issues faced by these techniques in the literature: (1) some informative samples may be neglected by random under-sampling and adding synthetic positive samples through upsampling adds to training complexity, (2) cost factors must be pre-known or found, (3) classification systems are often optimized and compared using performance measurements (like accuracy) that are unsuitable for imbalance problem; (4) most learning algorithms are designed and tested on a fixed imbalance level of data, which may differ from operational scenarios; The objective of this thesis is to design specialized classifier ensembles to address the issue of imbalance in the face re-identification application and as sub-goals avoiding the abovementioned issues faced in the literature. In addition achieving an efficient classifier ensemble requires a learning algorithm to design and combine component classifiers that hold suitable diversity-accuracy trade off. To reach the objective of the thesis, four major contributions are made that are presented in three chapters summarized in the following. In Chapter 3, a new application-based sampling method is proposed to group samples for under-sampling in order to improve diversity-accuracy trade-off between classifiers of the ensemble. The proposed sampling method takes the advantage of the fact that in face re-identification applications, facial regions of a same person appearing in a camera field of view may be regrouped based on their trajectories found by face tracker. A partitional Bagging ensemble method is proposed that accounts for possible variations in imbalance level of the operational data by combining classifiers that are trained on different imbalance levels. In this method, all samples are used for training classifiers and information loss is therefore avoided. In Chapter 4, a new ensemble learning algorithm called Progressive Boosting (PBoost) is proposed that progressively inserts uncorrelated groups of samples into a Boosting procedure to avoid loosing information while generating a diverse pool of classifiers. From one iteration to the next, the PBoost algorithm accumulates these uncorrelated groups of samples into a set that grows gradually in size and imbalance. This algorithm is more sophisticated than the one proposed in Chapter 3 because instead of training the base classifiers on this set, the base classifiers are trained on balanced subsets sampled from this set and validated on the whole set. Therefore, the base classifiers are more accurate while the robustness to imbalance is not jeopardized. In addition, the sample selection is based on the weights that are assigned to samples which correspond to their importance. In addition, the computation complexity of PBoost is lower than Boosting ensemble techniques in the literature for learning from imbalanced data because not all of the base classifiers are validated on all negative samples. A new loss factor is also proposed to be used in PBoost to avoid biasing performance towards the negative class. Using this loss factor, the weight update of samples and classifier contribution in final predictions are set according to the ability of classifiers to recognize both classes. In comparing the performance of the classifier systems in Chapter 3 and 4, a need is faced for an evaluation space that compares classifiers in terms of a suitable performance metric over all of their decision thresholds, different imbalance levels of test data, and different preference between classes. The F-measure is often used to evaluate two-class classifiers on imbalanced data, and no global evaluation space was available in the literature for this measure. Therefore, in Chapter 5, a new global evaluation space for the F-measure is proposed that is analogous to the cost curves for expected cost. In this space, a classifier is represented as a curve that shows its performance over all of its decision thresholds and a range of possible imbalance levels for the desired preference of true positive rate to precision. These properties are missing in ROC and precision-recall spaces. This space also allows us to empirically improve the performance of specialized ensemble learning methods for imbalance under a given operating condition. Through a validation, the base classifiers are combined using a modified version of the iterative Boolean combination algorithm such that the selection criterion in this algorithm is replaced by F-measure instead of AUC, and the combination is carried out for each operating condition. The proposed approaches in this thesis were validated and compared using synthetic data and videos from the Faces In Action, and COX datasets that emulate face re-identification applications. Results show that the proposed techniques outperforms state of the art techniques over different levels of imbalance and overlap between classes

    Face recognition in video surveillance from a single reference sample through domain adaptation

    Face recognition (FR) has received significant attention during the past decades in many applications, such as law enforcement, forensics, access controls, information security and video surveillance (VS), due to its covert and non-intrusive nature. FR systems specialized for VS seek to accurately detect the presence of target individuals of interest over a distributed network of video cameras under uncontrolled capture conditions. Therefore, recognizing faces of target individuals in such environment is a challenging problem because the appearance of faces varies due to changes in pose, scale, illumination, occlusion, blur, etc. The computational complexity is also an important consideration because of the growing number of cameras, and the processing time of state-of-the-art face detection, tracking and matching algorithms. In this thesis, adaptive systems are proposed for accurate still-to-video FR, where a single (or very few) reference still or a mug-shot is available to design a facial model for the target individual. This is a common situation in real-world watch-list screening applications due to the cost and feasibility of capturing reference stills, and managing facial models over time. The limited number of reference stills can adversely affect the robustness of facial models to intra-class variations, and therefore the performance of still-to-video FR systems. Moreover, a specific challenge in still-to-video FR is the shift between the enrollment domain, where high-quality reference faces are captured under controlled conditions from still cameras, and the operational domain, where faces are captured with video cameras under uncontrolled conditions. To overcome the challenges of such single sample per person (SSPP) problems, 3 new systems are proposed for accurate still-to-video FR that are based on multiple face representations and domain adaptation. In particular, this thesis presents 3 contributions. These contributions are described with more details in the following statements. In Chapter 3, a multi-classifier framework is proposed for robust still-to-video FR based on multiple and diverse face representations of a single reference face still. During enrollment of a target individual, the single reference face still is modeled using an ensemble of SVM classifiers based on different patches and face descriptors. Multiple feature extraction techniques are applied to patches isolated in the reference still to generate a diverse SVM pool that provides robustness to common nuisance factors (e.g., variations in illumination and pose). The estimation of discriminant feature subsets, classifier parameters, decision thresholds, and ensemble fusion functions is achieved using the high-quality reference still and a large number of faces captured in lower quality video of non-target individuals in the scene. During operations, the most competent subset of SVMs are dynamically selected according to capture conditions. Finally, a head-face tracker gradually regroups faces captured from different people appearing in a scene, while each individual-specific ensemble performs face matching. The accumulation of matching scores per face track leads to a robust spatio-temporal FR when accumulated ensemble scores surpass a detection threshold. Experimental results obtained with the Chokepoint and COX-S2V datasets show a significant improvement in performance w.r.t. reference systems, especially when individual-specific ensembles (1) are designed using exemplar-SVMs rather than one-class SVMs, and (2) exploit score-level fusion of local SVMs (trained using features extracted from each patch), rather than using either decision-level or feature-level fusion with a global SVM (trained by concatenating features extracted from patches). In Chapter 4, an efficient multi-classifier system (MCS) is proposed for accurate still-to-video FR based on multiple face representations and domain adaptation (DA). An individual-specific ensemble of exemplar-SVM (e-SVM) classifiers is thereby designed to improve robustness to intra-class variations. During enrollment of a target individual, an ensemble is used to model the single reference still, where multiple face descriptors and random feature subspaces allow to generate a diverse pool of patch-wise classifiers. To adapt these ensembles to the operational domains, e-SVMs are trained using labeled face patches extracted from the reference still versus patches extracted from cohort and other non-target stills mixed with unlabeled patches extracted from the corresponding face trajectories captured with surveillance cameras. During operations, the most competent classifiers per given probe face are dynamically selected and weighted based on the internal criteria determined in the feature space of e-SVMs. This chapter also investigates the impact of using different training schemes for DA, as well as, the validation set of non-target faces extracted from stills and video trajectories of unknown individuals in the operational domain. The results indicate that the proposed system can surpass state-of-the-art accuracy, yet with a significantly lower computational complexity. In Chapter 5, a deep convolutional neural network (CNN) is proposed to cope with the discrepancies between facial regions of interest (ROIs) isolated in still and video faces for robust still-to-video FR. To that end, a face-flow autoencoder CNN called FFA-CNN is trained using both still and video ROIs in a supervised end-to-end multi-task learning. A novel loss function containing a weighted combination of pixel-wise, symmetry-wise and identity preserving losses is introduced to optimize the network parameters. The proposed FFA-CNN incorporates a reconstruction network and a fully-connected classification network, where the former reconstructs a well-illuminated frontal ROI with neutral expression from a pair of low-quality non-frontal video ROIs and the latter is utilized to compare the still and video representations to provide matching scores. Thus, integrating the proposed weighted loss function with a supervised end-to-end training approach leads to generate high-quality frontal faces and learn discriminative face representations similar for the same identities. Simulation results obtained over challenging COX Face DB confirm the effectiveness of the proposed FFA-CNN to achieve convincing performance compared to current state-of-the-art CNN-based FR systems