7 research outputs found

    K-Nearest Oracles Borderline Dynamic Classifier Ensemble Selection

    Full text link
    Dynamic Ensemble Selection (DES) techniques aim to select locally competent classifiers for the classification of each new test sample. Most DES techniques estimate the competence of classifiers using a given criterion over the region of competence of the test sample (its the nearest neighbors in the validation set). The K-Nearest Oracles Eliminate (KNORA-E) DES selects all classifiers that correctly classify all samples in the region of competence of the test sample, if such classifier exists, otherwise, it removes from the region of competence the sample that is furthest from the test sample, and the process repeats. When the region of competence has samples of different classes, KNORA-E can reduce the region of competence in such a way that only samples of a single class remain in the region of competence, leading to the selection of locally incompetent classifiers that classify all samples in the region of competence as being from the same class. In this paper, we propose two DES techniques: K-Nearest Oracles Borderline (KNORA-B) and K-Nearest Oracles Borderline Imbalanced (KNORA-BI). KNORA-B is a DES technique based on KNORA-E that reduces the region of competence but maintains at least one sample from each class that is in the original region of competence. KNORA-BI is a variation of KNORA-B for imbalance datasets that reduces the region of competence but maintains at least one minority class sample if there is any in the original region of competence. Experiments are conducted comparing the proposed techniques with 19 DES techniques from the literature using 40 datasets. The results show that the proposed techniques achieved interesting results, with KNORA-BI outperforming state-of-art techniques.Comment: Paper accepted for publication on IJCNN 201

    Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

    Full text link
    Cloud computing systems fail in complex and unexpected ways due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the analyst can miss severe failure modes that are yet unknown. This paper introduces a new paradigm (fault injection analytics) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes. We evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform, where we show that the approach can accurately identify failure modes with a low computational cost.Comment: IEEE Transactions on Dependable and Secure Computing; 16 pages. arXiv admin note: text overlap with arXiv:1908.1164

    Un metaverificador de firmas y su aplicación en la inscripción de organizaciones políticas en el Perú

    Get PDF
    En el Perú, para lograr una inscripción como organización política se debe contar con una relación de adherentes (planillones de firmas) la cual es verificada por el Registro Nacional de Identificación y Estado Civil, utilizando la técnica del cotejo visual. La problemática radica en que esta técnica es completamente manual, propensa al error humano influenciado por los tiempos cortos para homologación y alta demanda en época electoral, lo cual está ocasionando que la verificación de firmas no se realice de manera exhaustiva, llegando a aceptar firmas cuya originalidad no ha sido completamente verificada. En consecuencia, algunas organizaciones políticas están logrando su inscripción en el ROP con firmas falsificadas, las cuales posteriormente son denunciadas en los medios de comunicación, generando desconfianza en la ciudadanía. Este trabajo de investigación propone el desarrollo de un metaverificador de firmas, el cual realizará la verificación de los patrones de la firma en cuestión con las firmas genuinas, determinando la originalidad de la misma. La propuesta incluye el uso de nuevas características y un motor de verificación compuesto por dos módulos, el primer módulo tiene como función verificar si la firma en cuestión es falsa, y el segundo, realizar una verificación más detallada de las firmas que no fueron detectadas como falsas en el primer módulo. Los resultados demuestran que el metaverificador propuesto logra obtener una precisión del 93.3%, lo cual es bastante alto en comparación con resultados señalados en la literatura, usando solo 3 firmas genuinas para el entrenamiento.Perú. Ministerio de la Producción. Programa Nacional de Innovación para la Competitividad y Productividad (Innóvate Perú)Tesi

    Multi-classifier systems for off-line signature verification

    Get PDF
    Handwritten signatures are behavioural biometric traits that are known to incorporate a considerable amount of intra-class variability. The Hidden Markov Model (HMM) has been successfully employed in many off-line signature verification (SV) systems due to the sequential nature and variable size of the signature data. In particular, the left-to-right topology of HMMs is well adapted to the dynamic characteristics of occidental handwriting, in which the hand movements are always from left to right. As with most generative classifiers, HMMs require a considerable amount of training data to achieve a high level of generalization performance. Unfortunately, the number of signature samples available to train an off-line SV system is very limited in practice. Moreover, only random forgeries are employed to train the system, which must in turn to discriminate between genuine samples and random, simple and skilled forgeries during operations. These last two forgery types are not available during the training phase. The approaches proposed in this Thesis employ the concept of multi-classifier systems (MCS) based on HMMs to learn signatures at several levels of perception. By extracting a high number of features, a pool of diversified classifiers can be generated using random subspaces, which overcomes the problem of having a limited amount of training data. Based on the multi-hypotheses principle, a new approach for combining classifiers in the ROC space is proposed. A technique to repair concavities in ROC curves allows for overcoming the problem of having a limited amount of genuine samples, and, especially, for evaluating performance of biometric systems more accurately. A second important contribution is the proposal of a hybrid generative-discriminative classification architecture. The use of HMMs as feature extractors in the generative stage followed by Support Vector Machines (SVMs) as classifiers in the discriminative stage allows for a better design not only of the genuine class, but also of the impostor class. Moreover, this approach provides a more robust learning than a traditional HMM-based approach when a limited amount of training data is available. The last contribution of this Thesis is the proposal of two new strategies for the dynamic selection (DS) of ensemble of classifiers. Experiments performed with the PUCPR and GPDS signature databases indicate that the proposed DS strategies achieve a higher level of performance in off-line SV than other reference DS and static selection (SS) strategies from literature

    Face recognition using infrared vision

    Get PDF
    Au cours de la dernière décennie, la reconnaissance de visage basée sur l’imagerie infrarouge (IR) et en particulier la thermographie IR est devenue une alternative prometteuse aux approches conventionnelles utilisant l’imagerie dans le spectre visible. En effet l’imagerie (visible et infrarouge) trouvent encore des contraintes à leur application efficace dans le monde réel. Bien qu’insensibles à toute variation d’illumination dans le spectre visible, les images IR sont caractérisées par des défis spécifiques qui leur sont propres, notamment la sensibilité aux facteurs qui affectent le rayonnement thermique du visage tels que l’état émotionnel, la température ambiante, la consommation d’alcool, etc. En outre, il est plus laborieux de corriger l’expression du visage et les changements de poses dans les images IR puisque leur contenu est moins riche aux hautes fréquences spatiales ce qui représente en fait une indication importante pour le calage de tout modèle déformable. Dans cette thèse, nous décrivons une nouvelle méthode qui répond à ces défis majeurs. Concrètement, pour remédier aux changements dans les poses et expressions du visage, nous générons une image synthétique frontale du visage qui est canonique et neutre vis-à-vis de toute expression faciale à partir d’une image du visage de pose et expression faciale arbitraires. Ceci est réalisé par l’application d’une déformation affine par morceaux précédée par un calage via un modèle d’apparence active (AAM). Ainsi, une de nos publications est la première publication qui explore l’utilisation d’un AAM sur les images IR thermiques ; nous y proposons une étape de prétraitement qui rehausse la netteté des images thermiques, ce qui rend la convergence de l’AAM rapide et plus précise. Pour surmonter le problème des images IR thermiques par rapport au motif exact du rayonnement thermique du visage, nous le décrivons celui-ci par une représentation s’appuyant sur des caractéristiques anatomiques fiables. Contrairement aux approches existantes, notre représentation n’est pas binaire ; elle met plutôt l’accent sur la fiabilité des caractéristiques extraites. Cela rend la représentation proposée beaucoup plus robuste à la fois à la pose et aux changements possibles de température. L’efficacité de l’approche proposée est démontrée sur la plus grande base de données publique des vidéos IR thermiques des visages. Sur cette base d’images, notre méthode atteint des performances de reconnaissance assez bonnes et surpasse de manière significative les méthodes décrites précédemment dans la littérature. L’approche proposée a également montré de très bonnes performances sur des sous-ensembles de cette base de données que nous avons montée nous-mêmes au sein de notre laboratoire. A notre connaissance, il s’agit de l’une des bases de données les plus importantes disponibles à l’heure actuelle tout en présentant certains défis.Over the course of the last decade, infrared (IR) and particularly thermal IR imaging based face recognition has emerged as a promising complement to conventional, visible spectrum based approaches which continue to struggle when applied in the real world. While inherently insensitive to visible spectrum illumination changes, IR images introduce specific challenges of their own, most notably sensitivity to factors which affect facial heat emission patterns, e.g., emotional state, ambient temperature, etc. In addition, facial expression and pose changes are more difficult to correct in IR images because they are less rich in high frequency details which is an important cue for fitting any deformable model. In this thesis we describe a novel method which addresses these major challenges. Specifically, to normalize for pose and facial expression changes we generate a synthetic frontal image of a face in a canonical, neutral facial expression from an image of the face in an arbitrary pose and facial expression. This is achieved by piecewise affine warping which follows active appearance model (AAM) fitting. This is the first work which explores the use of an AAM on thermal IR images; we propose a pre-processing step which enhances details in thermal images, making AAM convergence faster and more accurate. To overcome the problem of thermal IR image sensitivity to the exact pattern of facial temperature emissions we describe a representation based on reliable anatomical features. In contrast to previous approaches, our representation is not binary; rather, our method accounts for the reliability of the extracted features. This makes the proposed representation much more robust both to pose and scale changes. The effectiveness of the proposed approach is demonstrated on the largest public database of thermal IR images of faces on which it achieves satisfying recognition performance and significantly outperforms previously described methods. The proposed approach has also demonstrated satisfying performance on subsets of the largest video database of the world gathered in our laboratory which will be publicly available free of charge in future. The reader should note that due to the very nature of the feature extraction method in our system (i.e., anatomical based nature of it), we anticipate high robustness of our system to some challenging factors such as the temperature changes. However, we were not able to investigate this in depth due to the limits which exist in gathering realistic databases. Gathering the largest video database considering some challenging factors is one of the other contributions of this research

    Financial Fraud Detection and Data Mining of Imbalanced Databases using State Space Machine Learning

    Get PDF
    Risky decisions made by humans exhibit characteristics common to each decision. The related systems experience repeated abuse by risky humans and their actions collude to form a systemic behavioural set. Financial fraud is an example of such risky behaviour. Fraud detection models have drawn attention since the financial crisis of 2008 because of their frequency, size and technological advances leading to financial market manipulation. Statistical methods dominate industrial fraud detection systems at banks, insurance companies and financial marketplaces. Most efforts thus far have focused on anomaly detection problems and simple rules in the academic literature and industrial setting. There are unsolved issues in modeling the behaviour of risky agents in real-world financial markets using machine learning. This research studies the challenges posed by fraud detection, including the problem of imbalanced class distributions, and investigates the use of Reinforcement Learning (RL) to model risky human behaviour. Models have been developed to transform the relevant financial data into a state-space system. Reinforcement Learning agents uncover the decision-making processes by risky humans and derive an optimal path of behaviour at the end of the learning process. States are weighted by risk and then classified as positive (risky) or negative (not-risky). The positive samples are composed of features that represent the hidden information underlying the risky behaviour. Reinforcement Learning is implemented as unsupervised and supervised models. The unsupervised learning agent searches for risky behaviour without any previous knowledge of the data; it is not “trained” on data with true class labels. Instead, the RL learner relates samples through experience. The supervised learner is trained on a proportion (e.g. 90%) of the data with class labels. It derives a policy of optimal actions to be taken at each state during the training stage. One policy is selected from several learning agents and then the model is exposed to the other proportion (e.g. 10%) of data for classification. RL is hybridized with a Hidden Markov Model (HMM) in the supervised learning model to impose a probabilistic framework around the risky agent’s behaviour. We first study an insider trading example to demonstrate how learning algorithms can mimic risky agents. The classification power of the model is further demonstrated by applying it to a real-world based database for debit card transaction fraud. We then apply the models to two problems found in Statistics Canada databases: heart disease detection and female labour force participation. All models are evaluated using appropriate measures for imbalanced class problems: “sensitivity” and “false positive”. Sensitivity measures the number of correctly classified positive samples (e.g. fraud) as a proportion of all positive samples in the data. False positive counts the number of negative samples classified positive as a proportion of all negative samples in the data. The intent is to maximize sensitivity and minimize the false positive rate. All models show high sensitivity rates while exhibiting low false positive rates. These two metrics are ideal for industrial implementation because of high levels of identification at a low cost. Fraud detection rate is the focus with detection rates of 75-85% proving that RL is a superior method for data mining of imbalanced databases. By solving the problem of hidden information, this research can facilitate the detection of risky human behaviour and prevent it from happening
    corecore