28 research outputs found

    Performance Generalization in Biometric Authentication Using Joint User-Specific and Sample Bootstraps

    Get PDF
    Biometric authentication performance is often depicted by a DET curve. We show that this curve is dependent on the choice of samples available, the demographic composition and the number of users specific to a database. We propose a two-step bootstrap procedure to take into account of the three mentioned sources of variability. This is an extension to the Bolle \etal's bootstrap subset technique. Preliminary experiments on the NIST2005 and XM2VTS benchmark databases is encouraging, e.g., the average result across all 24 systems evaluated on NIST2005 indicates that one can predict, with more than 75\% of DET coverage, an unseen DET curve with 8 times more users. Furthermore, our finding suggests that with more data available, the confidence intervals become smaller and hence more useful

    Estimating the Confidence Interval of Expected Performance Curve in Biometric Authentication Using Joint Bootstrap

    Get PDF
    Evaluating biometric authentication performance is a complex task because the performance depends on the user set size, composition and the choice of samples. We propose to reduce the performance dependency of these three factors by deriving appropriate confidence intervals. In this study, we focus on deriving a confidence region based on the recently proposed Expected Performance Curve (EPC). An EPC is different from the conventional DET or ROC curve because an EPC assumes that the test class-conditional (client and impostor) score distributions are unknown and this includes the choice of the decision threshold for various operating points. Instead, an EPC selects thresholds based on the training set and applies them on the test set. The proposed technique is useful, for example, to quote realistic upper and lower bounds of the decision cost function used in the NIST annual speaker evaluation. Our findings, based on the 24 systems submitted to the NIST2005 evaluation, show that the confidence region obtained from our proposed algorithm can correctly predict the performance of an unseen database with two times more users with an average coverage of 95\% (over all the 24 systems). A coverage is the proportion of the unseen EPC covered by the derived confidence interval

    Generalizing DET Curves Across Application Scenarios

    Full text link

    Multi-system Biometric Authentication: Optimal Fusion and User-Specific Information

    Get PDF
    Verifying a person's identity claim by combining multiple biometric systems (fusion) is a promising solution to identity theft and automatic access control. This thesis contributes to the state-of-the-art of multimodal biometric fusion by improving the understanding of fusion and by enhancing fusion performance using information specific to a user. One problem to deal with at the score level fusion is to combine system outputs of different types. Two statistically sound representations of scores are probability and log-likelihood ratio (LLR). While they are equivalent in theory, LLR is much more useful in practice because its distribution can be approximated by a Gaussian distribution, which makes it useful to analyze the problem of fusion. Furthermore, its score statistics (mean and covariance) conditioned on the claimed user identity can be better exploited. Our first contribution is to estimate the fusion performance given the class-conditional score statistics and given a particular fusion operator/classifier. Thanks to the score statistics, we can predict fusion performance with reasonable accuracy, identify conditions which favor a particular fusion operator, study the joint phenomenon of combining system outputs with different degrees of strength and correlation and possibly correct the adverse effect of bias (due to the score-level mismatch between training and test sets) on fusion. While in practice the class-conditional Gaussian assumption is not always true, the estimated performance is found to be acceptable. Our second contribution is to exploit the user-specific prior knowledge by limiting the class-conditional Gaussian assumption to each user. We exploit this hypothesis in two strategies. In the first strategy, we combine a user-specific fusion classifier with a user-independent fusion classifier by means of two LLR scores, which are then weighted to obtain a single output. We show that combining both user-specific and user-independent LLR outputs always results in improved performance than using the better of the two. In the second strategy, we propose a statistic called the user-specific F-ratio, which measures the discriminative power of a given user based on the Gaussian assumption. Although similar class separability measures exist, e.g., the Fisher-ratio for a two-class problem and the d-prime statistic, F-ratio is more suitable because it is related to Equal Error Rate in a closed form. F-ratio is used in the following applications: a user-specific score normalization procedure, a user-specific criterion to rank users and a user-specific fusion operator that selectively considers a subset of systems for fusion. The resultant fusion operator leads to a statistically significantly increased performance with respect to the state-of-the-art fusion approaches. Even though the applications are different, the proposed methods share the following common advantages. Firstly, they are robust to deviation from the Gaussian assumption. Secondly, they are robust to few training data samples thanks to Bayesian adaptation. Finally, they consider both the client and impostor information simultaneously

    Electronic capture and analysis of fraudulent behavioral patterns : an application to identity fraud

    Get PDF
    The objective of this research was to find a transparent and secure solution for mitigating identity fraud and to find the critical factors that determine the solution\u27s acceptance. Identity fraud is identified as a key problem with total losses exceeding fifty two billion dollars (Javelin Strategy and Research 2005). A common denominator in most identity-fraud-prone transactions is the use of a keypad; hence this research focuses on keypad data entry and proposes a biometric solution. Three studies develop, evaluate and investigate the feasibility of this solution. The first study was done in three stages. Stage one investigated the technical feasibility of the biometric keypad, stage two evaluated the keypad under different field conditions and stage three investigated acceptable user parameters. A key shortcoming with current authentication methods is the use of external identifiers that are prone to theft, unlike biometric patterns. A biometric keypad that supplements the present external identifiers was proposed, prototyped and evaluated. The results demonstrated that a biometric keypad can be a feasible medium performance solution. Addition of pressure and higher typing speeds were found to enhance discrimination accuracy while typing patterns were found to vary with elapsed time which led to deterioration in accuracy. The second study interviewed executives with experience in the introduction of new technologies with the objective of identifying and ranking critical factors that are important in the adoption of new biometrics. Performance, ease-of-use and trust-privacy issues were the most cited factors. A biometric acceptance model was formulated and five hypotheses were proposed from these interviews and prior research. Executives rated the keypad\u27s ease-of-use high in comparison to other biometric approaches but were concerned about its accuracy. The third study was a user attitude survey whose objective was to validate the formulated biometric acceptance model and acquire data on acceptable usage parameters. The proposed biometric model was validated and the proposed hypotheses were supported. Acceptable error rates and training times indicated that the biometric keypad would be more complex to engineer. The dissertation concludes by summarizing the contributions and limitations of the three studies followed by several suggestions for future research

    BIOMETRIC TECHNOLOGIES FOR AMBIENT INTELLIGENCE

    Get PDF
    Il termine Ambient Intelligence (AmI) si riferisce a un ambiente in grado di riconoscere e rispondere alla presenza di diversi individui in modo trasparente, non intrusivo e spesso invisibile. In questo tipo di ambiente, le persone sono circondate da interfacce uomo macchina intuitive e integrate in oggetti di ogni tipo. Gli scopi dell\u2019AmI sono quelli di fornire un supporto ai servizi efficiente e di facile utilizzo per accrescere le potenzialit\ue0 degli individui e migliorare l\u2019interazioni uomo-macchina. Le tecnologie di AmI possono essere impiegate in contesti come uffici (smart offices), case (smart homes), ospedali (smart hospitals) e citt\ue0 (smart cities). Negli scenari di AmI, i sistemi biometrici rappresentano tecnologie abilitanti al fine di progettare servizi personalizzati per individui e gruppi di persone. La biometria \ue8 la scienza che si occupa di stabilire l\u2019identit\ue0 di una persona o di una classe di persone in base agli attributi fisici o comportamentali dell\u2019individuo. Le applicazioni tipiche dei sistemi biometrici includono: controlli di sicurezza, controllo delle frontiere, controllo fisico dell\u2019accesso e autenticazione per dispositivi elettronici. Negli scenari basati su AmI, le tecnologie biometriche devono funzionare in condizioni non controllate e meno vincolate rispetto ai sistemi biometrici comunemente impiegati. Inoltre, in numerosi scenari applicativi, potrebbe essere necessario utilizzare tecniche in grado di funzionare in modo nascosto e non cooperativo. In questo tipo di applicazioni, i campioni biometrici spesso presentano una bassa qualit\ue0 e i metodi di riconoscimento biometrici allo stato dell\u2019arte potrebbero ottenere prestazioni non soddisfacenti. \uc8 possibile distinguere due modi per migliorare l\u2019applicabilit\ue0 e la diffusione delle tecnologie biometriche negli scenari basati su AmI. Il primo modo consiste nel progettare tecnologie biometriche innovative che siano in grado di funzionare in modo robusto con campioni acquisiti in condizioni non ideali e in presenza di rumore. Il secondo modo consiste nel progettare approcci biometrici multimodali innovativi, in grado di sfruttare a proprio vantaggi tutti i sensori posizionati in un ambiente generico, al fine di ottenere un\u2019elevata accuratezza del riconoscimento ed effettuare autenticazioni continue o periodiche in modo non intrusivo. Il primo obiettivo di questa tesi \ue8 la progettazione di sistemi biometrici innovativi e scarsamente vincolati in grado di migliorare, rispetto allo stato dell\u2019arte attuale, la qualit\ue0 delle tecniche di interazione uomo-macchine in diversi scenari applicativi basati su AmI. Il secondo obiettivo riguarda la progettazione di approcci innovativi per migliorare l\u2019applicabilit\ue0 e l\u2019integrazione di tecnologie biometriche eterogenee negli scenari che utilizzano AmI. In particolare, questa tesi considera le tecnologie biometriche basate su impronte digitali, volto, voce e sistemi multimodali. Questa tesi presenta le seguenti ricerche innovative: \u2022 un metodo per il riconoscimento del parlatore tramite la voce in applicazioni che usano AmI; \u2022 un metodo per la stima dell\u2019et\ue0 dell\u2019individuo da campioni acquisiti in condizioni non-ideali nell\u2019ambito di scenari basati su AmI; \u2022 un metodo per accrescere l\u2019accuratezza del riconoscimento biometrico in modo protettivo della privacy e basato sulla normalizzazione degli score biometrici tramite l\u2019analisi di gruppi di campioni simili tra loro; \u2022 un approccio per la fusione biometrica multimodale indipendente dalla tecnologia utilizzata, in grado di combinare tratti biometrici eterogenei in scenari basati su AmI; \u2022 un approccio per l\u2019autenticazione continua multimodale in applicazioni che usano AmI. Le tecnologie biometriche innovative progettate e descritte in questa tesi sono state validate utilizzando diversi dataset biometrici (sia pubblici che acquisiti in laboratorio), i quali simulano le condizioni che si possono verificare in applicazioni di AmI. I risultati ottenuti hanno dimostrato la realizzabilit\ue0 degli approcci studiati e hanno mostrato che i metodi progettati aumentano l\u2019accuratezza, l\u2019applicabilit\ue0 e l\u2019usabilit\ue0 delle tecnologie biometriche rispetto allo stato dell\u2019arte negli scenari basati su AmI.Ambient Intelligence (AmI) refers to an environment capable of recognizing and responding to the presence of different individuals in a seamless, unobtrusive and often invisible way. In this environment, people are surrounded by intelligent intuitive interfaces that are embedded in all kinds of objects. The goals of AmI are to provide greater user-friendliness, more efficient services support, user-empowerment, and support for human interactions. Examples of AmI scenarios are smart cities, smart homes, smart offices, and smart hospitals. In AmI, biometric technologies represent enabling technologies to design personalized services for individuals or groups of people. Biometrics is the science of establishing the identity of an individual or a class of people based on the physical, or behavioral attributes of the person. Common applications include: security checks, border controls, access control to physical places, and authentication to electronic devices. In AmI, biometric technologies should work in uncontrolled and less-constrained conditions with respect to traditional biometric technologies. Furthermore, in many application scenarios, it could be required to adopt covert and non-cooperative technologies. In these non-ideal conditions, the biometric samples frequently present poor quality, and state-of-the-art biometric technologies can obtain unsatisfactory performance. There are two possible ways to improve the applicability and diffusion of biometric technologies in AmI. The first one consists in designing novel biometric technologies robust to samples acquire in noisy and non-ideal conditions. The second one consists in designing novel multimodal biometric approaches that are able to take advantage from all the sensors placed in a generic environment in order to achieve high recognition accuracy and to permit to perform continuous or periodic authentications in an unobtrusive manner. The first goal of this thesis is to design innovative less-constrained biometric systems, which are able to improve the quality of the human-machine interaction in different AmI environments with respect to the state-of-the-art technologies. The second goal is to design novel approaches to improve the applicability and integration of heterogeneous biometric systems in AmI. In particular, the thesis considers technologies based on fingerprint, face, voice, and multimodal biometrics. This thesis presents the following innovative research studies: \u2022 a method for text-independent speaker identification in AmI applications; \u2022 a method for age estimation from non-ideal samples acquired in AmI scenarios; \u2022 a privacy-compliant cohort normalization technique to increase the accuracy of already deployed biometric systems; \u2022 a technology-independent multimodal fusion approach to combine heterogeneous traits in AmI scenarios; \u2022 a multimodal continuous authentication approach for AmI applications. The designed novel biometric technologies have been tested on different biometric datasets (both public and collected in our laboratory) simulating the acquisitions performed in AmI applications. Results proved the feasibility of the studied approaches and shown that the studied methods effectively increased the accuracy, applicability, and usability of biometric technologies in AmI with respect to the state-of-the-art

    CONTACTLESS FINGERPRINT BIOMETRICS: ACQUISITION, PROCESSING, AND PRIVACY PROTECTION

    Get PDF
    Biometrics is defined by the International Organization for Standardization (ISO) as \u201cthe automated recognition of individuals based on their behavioral and biological characteristics\u201d Examples of distinctive features evaluated by biometrics, called biometric traits, are behavioral characteristics like the signature, gait, voice, and keystroke, and biological characteristics like the fingerprint, face, iris, retina, hand geometry, palmprint, ear, and DNA. The biometric recognition is the process that permits to establish the identity of a person, and can be performed in two modalities: verification, and identification. The verification modality evaluates if the identity declared by an individual corresponds to the acquired biometric data. Differently, in the identification modality, the recognition application has to determine a person's identity by comparing the acquired biometric data with the information related to a set of individuals. Compared with traditional techniques used to establish the identity of a person, biometrics offers a greater confidence level that the authenticated individual is not impersonated by someone else. Traditional techniques, in fact, are based on surrogate representations of the identity, like tokens, smart cards, and passwords, which can easily be stolen or copied with respect to biometric traits. This characteristic permitted a wide diffusion of biometrics in different scenarios, like physical access control, government applications, forensic applications, logical access control to data, networks, and services. Most of the biometric applications, also called biometric systems, require performing the acquisition process in a highly controlled and cooperative manner. In order to obtain good quality biometric samples, the acquisition procedures of these systems need that the users perform deliberate actions, assume determinate poses, and stay still for a time period. Limitations regarding the applicative scenarios can also be present, for example the necessity of specific light and environmental conditions. Examples of biometric technologies that traditionally require constrained acquisitions are based on the face, iris, fingerprint, and hand characteristics. Traditional face recognition systems need that the users take a neutral pose, and stay still for a time period. Moreover, the acquisitions are based on a frontal camera and performed in controlled light conditions. Iris acquisitions are usually performed at a distance of less than 30 cm from the camera, and require that the user assume a defined pose and stay still watching the camera. Moreover they use near infrared illumination techniques, which can be perceived as dangerous for the health. Fingerprint recognition systems and systems based on the hand characteristics require that the users touch the sensor surface applying a proper and uniform pressure. The contact with the sensor is often perceived as unhygienic and/or associated to a police procedure. This kind of constrained acquisition techniques can drastically reduce the usability and social acceptance of biometric technologies, therefore decreasing the number of possible applicative contexts in which biometric systems could be used. In traditional fingerprint recognition systems, the usability and user acceptance are not the only negative aspects of the used acquisition procedures since the contact of the finger with the sensor platen introduces a security lack due to the release of a latent fingerprint on the touched surface, the presence of dirt on the surface of the finger can reduce the accuracy of the recognition process, and different pressures applied to the sensor platen can introduce non-linear distortions and low-contrast regions in the captured samples. Other crucial aspects that influence the social acceptance of biometric systems are associated to the privacy and the risks related to misuses of biometric information acquired, stored and transmitted by the systems. One of the most important perceived risks is related to the fact that the persons consider the acquisition of biometric traits as an exact permanent filing of their activities and behaviors, and the idea that the biometric systems can guarantee recognition accuracy equal to 100\% is very common. Other perceived risks consist in the use of the collected biometric data for malicious purposes, for tracing all the activities of the individuals, or for operating proscription lists. In order to increase the usability and the social acceptance of biometric systems, researchers are studying less-constrained biometric recognition techniques based on different biometric traits, for example, face recognition systems in surveillance applications, iris recognition techniques based on images captured at a great distance and on the move, and contactless technologies based on the fingerprint and hand characteristics. Other recent studies aim to reduce the real and perceived privacy risks, and consequently increase the social acceptance of biometric technologies. In this context, many studies regard methods that perform the identity comparison in the encrypted domain in order to prevent possible thefts and misuses of biometric data. The objective of this thesis is to research approaches able to increase the usability and social acceptance of biometric systems by performing less-constrained and highly accurate biometric recognitions in a privacy compliant manner. In particular, approaches designed for high security contexts are studied in order improve the existing technologies adopted in border controls, investigative, and governmental applications. Approaches based on low cost hardware configurations are also researched with the aim of increasing the number of possible applicative scenarios of biometric systems. The privacy compliancy is considered as a crucial aspect in all the studied applications. Fingerprint is specifically considered in this thesis, since this biometric trait is characterized by high distinctivity and durability, is the most diffused trait in the literature, and is adopted in a wide range of applicative contexts. The studied contactless biometric systems are based on one or more CCD cameras, can use two-dimensional or three-dimensional samples, and include privacy protection methods. The main goal of these systems is to perform accurate and privacy compliant recognitions in less-constrained applicative contexts with respect to traditional fingerprint biometric systems. Other important goals are the use of a wider fingerprint area with respect to traditional techniques, compatibility with the existing databases, usability, social acceptance, and scalability. The main contribution of this thesis consists in the realization of novel biometric systems based on contactless fingerprint acquisitions. In particular, different techniques for every step of the recognition process based on two-dimensional and three-dimensional samples have been researched. Novel techniques for the privacy protection of fingerprint data have also been designed. The studied approaches are multidisciplinary since their design and realization involved optical acquisition systems, multiple view geometry, image processing, pattern recognition, computational intelligence, statistics, and cryptography. The implemented biometric systems and algorithms have been applied to different biometric datasets describing a heterogeneous set of applicative scenarios. Results proved the feasibility of the studied approaches. In particular, the realized contactless biometric systems have been compared with traditional fingerprint recognition systems, obtaining positive results in terms of accuracy, usability, user acceptability, scalability, and security. Moreover, the developed techniques for the privacy protection of fingerprint biometric systems showed satisfactory performances in terms of security, accuracy, speed, and memory usage

    Probabilistic models for multi-view semi-supervised learning and coding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 146-160).This thesis investigates the problem of classification from multiple noisy sensors or modalities. Examples include speech and gesture interfaces and multi-camera distributed sensor networks. Reliable recognition in such settings hinges upon the ability to learn accurate classification models in the face of limited supervision and to cope with the relatively large amount of potentially redundant information transmitted by each sensor or modality (i.e., view). We investigate and develop novel multi view learning algorithms capable of learning from semi-supervised noisy sensor data, for automatically adapting to new users and working conditions, and for performing distributed feature selection on bandwidth limited sensor networks. We propose probabilistic models built upon multi-view Gaussian Processes (GPs) for solving this class of problems, and demonstrate our approaches for solving audio-visual speech and gesture, and multi-view object classification problems. Multi-modal tasks are good candidates for multi-view learning, since each modality provides a potentially redundant view to the learning algorithm. On audio-visual speech unit classification, and user agreement recognition using spoken utterances and head gestures, we demonstrate that multi-modal co-training can be used to learn from only a few labeled examples in one or both of the audio-visual modalities. We also propose a co-adaptation algorithm, which adapts existing audio-visual classifiers to a particular user or noise condition by leveraging the redundancy in the unlabeled data. Existing methods typically assume constant per-channel noise models.(cont.) In contrast we develop co-training algorithms that are able to learn from noisy sensor data corrupted by complex per-sample noise processes, e.g., occlusion common to multi sensor classification problems. We propose a probabilistic heteroscedastic approach to co-training that simultaneously discovers the amount of noise on a per-sample basis, while solving the classification task. This results in accurate performance in the presence of occlusion or other complex noise processes. We also investigate an extension of this idea for supervised multi-view learning where we develop a Bayesian multiple kernel learning algorithm that can learn a local weighting over each view of the input space. We additionally consider the problem of distributed object recognition or indexing from multiple cameras, where the computational power available at each camera sensor is limited and communication between cameras is prohibitively expensive. In this scenario, it is desirable to avoid sending redundant visual features from multiple views. Traditional supervised feature selection approaches are inapplicable as the class label is unknown at each camera. In this thesis, we propose an unsupervised multi-view feature selection algorithm based on a distributed coding approach. With our method, a Gaussian Process model of the joint view statistics is used at the receiver to obtain a joint encoding of the views without directly sharing information across encoders. We demonstrate our approach on recognition and indexing tasks with multi-view image databases and show that our method compares favorably to an independent encoding of the features from each camera.by C. Mario Christoudias.Ph.D

    Learning from imbalanced data in face re-identification using ensembles of classifiers

    Get PDF
    Face re-identification is a video surveillance application where systems for video-to-video face recognition are designed using faces of individuals captured from video sequences, and seek to recognize them when they appear in archived or live videos captured over a network of video cameras. Video-based face recognition applications encounter challenges due to variations in capture conditions such as pose, illumination etc. Other challenges in this application are twofold; 1) the imbalanced data distributions between the face captures of the individuals to be re-identified and those of other individuals 2) varying degree of imbalance during operations w.r.t. the design data. Learning from imbalanced data is challenging in general due in part to the bias of performance in most two-class classification systems towards correct classification of the majority (negative, or non-target) class (face images/frames captured from the individuals in not to be re-identified) better than the minority (positive, or target) class (face images/frames captured from the individual to be re-identified) because most two-class classification systems are intended to be used under balanced data condition. Several techniques have been proposed in the literature to learn from imbalanced data that either use data-level techniques to rebalance data (by under-sampling the majority class, up-sampling the minority class, or both) for training classifiers or use algorithm-level methods to guide the learning process (with or without cost sensitive approaches) such that the bias of performance towards correct classification of the majority class is neutralized. Ensemble techniques such as Bagging and Boosting algorithms have been shown to efficiently utilize these methods to address imbalance. However, there are issues faced by these techniques in the literature: (1) some informative samples may be neglected by random under-sampling and adding synthetic positive samples through upsampling adds to training complexity, (2) cost factors must be pre-known or found, (3) classification systems are often optimized and compared using performance measurements (like accuracy) that are unsuitable for imbalance problem; (4) most learning algorithms are designed and tested on a fixed imbalance level of data, which may differ from operational scenarios; The objective of this thesis is to design specialized classifier ensembles to address the issue of imbalance in the face re-identification application and as sub-goals avoiding the abovementioned issues faced in the literature. In addition achieving an efficient classifier ensemble requires a learning algorithm to design and combine component classifiers that hold suitable diversity-accuracy trade off. To reach the objective of the thesis, four major contributions are made that are presented in three chapters summarized in the following. In Chapter 3, a new application-based sampling method is proposed to group samples for under-sampling in order to improve diversity-accuracy trade-off between classifiers of the ensemble. The proposed sampling method takes the advantage of the fact that in face re-identification applications, facial regions of a same person appearing in a camera field of view may be regrouped based on their trajectories found by face tracker. A partitional Bagging ensemble method is proposed that accounts for possible variations in imbalance level of the operational data by combining classifiers that are trained on different imbalance levels. In this method, all samples are used for training classifiers and information loss is therefore avoided. In Chapter 4, a new ensemble learning algorithm called Progressive Boosting (PBoost) is proposed that progressively inserts uncorrelated groups of samples into a Boosting procedure to avoid loosing information while generating a diverse pool of classifiers. From one iteration to the next, the PBoost algorithm accumulates these uncorrelated groups of samples into a set that grows gradually in size and imbalance. This algorithm is more sophisticated than the one proposed in Chapter 3 because instead of training the base classifiers on this set, the base classifiers are trained on balanced subsets sampled from this set and validated on the whole set. Therefore, the base classifiers are more accurate while the robustness to imbalance is not jeopardized. In addition, the sample selection is based on the weights that are assigned to samples which correspond to their importance. In addition, the computation complexity of PBoost is lower than Boosting ensemble techniques in the literature for learning from imbalanced data because not all of the base classifiers are validated on all negative samples. A new loss factor is also proposed to be used in PBoost to avoid biasing performance towards the negative class. Using this loss factor, the weight update of samples and classifier contribution in final predictions are set according to the ability of classifiers to recognize both classes. In comparing the performance of the classifier systems in Chapter 3 and 4, a need is faced for an evaluation space that compares classifiers in terms of a suitable performance metric over all of their decision thresholds, different imbalance levels of test data, and different preference between classes. The F-measure is often used to evaluate two-class classifiers on imbalanced data, and no global evaluation space was available in the literature for this measure. Therefore, in Chapter 5, a new global evaluation space for the F-measure is proposed that is analogous to the cost curves for expected cost. In this space, a classifier is represented as a curve that shows its performance over all of its decision thresholds and a range of possible imbalance levels for the desired preference of true positive rate to precision. These properties are missing in ROC and precision-recall spaces. This space also allows us to empirically improve the performance of specialized ensemble learning methods for imbalance under a given operating condition. Through a validation, the base classifiers are combined using a modified version of the iterative Boolean combination algorithm such that the selection criterion in this algorithm is replaced by F-measure instead of AUC, and the combination is carried out for each operating condition. The proposed approaches in this thesis were validated and compared using synthetic data and videos from the Faces In Action, and COX datasets that emulate face re-identification applications. Results show that the proposed techniques outperforms state of the art techniques over different levels of imbalance and overlap between classes
    corecore