5 research outputs found

    An Immune Clonal Selection Algorithm for Synthetic Signature Generation

    Get PDF
    The collection of signature data for system development and evaluation generally requires significant time and effort. To overcome this problem, this paper proposes a detector generation based clonal selection algorithm for synthetic signature set generation. The goal of synthetic signature generation is to improve the performance of signature verification by providing more training samples. Our method uses the clonal selection algorithm to maintain the diversity of the overall set and avoid sparse feature distribution. The algorithm firstly generates detectors with a segmented r-continuous bits matching rule and P-receptor editing strategy to provide a more wider search space. Then the clonal selection algorithm is used to expand and optimize the overall signature set. We demonstrate the effectiveness of our clonal selection algorithm, and the experiments show that adding the synthetic training samples can improve the performance of signature verification

    Improved Class Statistics Estimation for Sparse Data Problems in Offline Signature Verification

    No full text

    Stochastic Tools for Network Security: Anonymity Protocol Analysis and Network Intrusion Detection

    Get PDF
    With the rapid development of Internet and the sharp increase of network crime, network security has become very important and received a lot of attention. In this dissertation, we model security issues as stochastic systems. This allows us to find weaknesses in existing security systems and propose new solutions. Exploring the vulnerabilities of existing security tools can prevent cyber-attacks from taking advantages of the system weaknesses. We consider The Onion Router (Tor), which is one of the most popular anonymity systems in use today, and show how to detect a protocol tunnelled through Tor. A hidden Markov model (HMM) is used to represent the protocol. Hidden Markov models are statistical models of sequential data like network traffic, and are an effective tool for pattern analysis. New, flexible and adaptive security schemes are needed to cope with emerging security threats. We propose a hybrid network security scheme including intrusion detection systems (IDSs) and honeypots scattered throughout the network. This combines the advantages of two security technologies. A honeypot is an activity-based network security system, which could be the logical supplement of the passive detection policies used by IDSs. This integration forces us to balance security performance versus cost by scheduling device activities for the proposed system. By formulating the scheduling problem as a decentralized partially observable Markov decision process (DEC-POMDP), decisions are made in a distributed manner at each device without requiring centralized control. When using a HMM, it is important to ensure that it accurately represents both the data used to train the model and the underlying process. Current methods assume that observations used to construct a HMM completely represent the underlying process. It is often the case that the training data size is not large enough to adequately capture all statistical dependencies in the system. It is therefore important to know the statistical significance level that the constructed model represents the underlying process, not only the training set. We present a method to determine if the observation data and constructed model fully express the underlying process with a given level of statistical significance. We apply this approach to detecting the existence of protocols tunnelled through Tor. While HMMs are a powerful tool for representing patterns allowing for uncertainties, they cannot be used for system control. The partially observable Markov decision process (POMDP) is a useful choice for controlling stochastic systems. As a combination of two Markov models, POMDPs combine the strength of HMM (capturing dynamics that depend on unobserved states) and that of Markov decision process (MDP) (taking the decision aspect into account). Decision making under uncertainty is used in many parts of business and science. We use here for security tools. We propose three approximation methods for discrete-time infinite-horizon POMDPs. One of the main contributions of our work is high-quality approximation solution for finite-space POMDPs with the average cost criterion, and their extension to DEC-POMDPs. The solution of the first algorithm is built out of the observable portion when the underlying MDP operates optimally. The other two methods presented here can be classified as the policy-based approximation schemes, in which we formulate the POMDP planning as a quadratically constrained linear program (QCLP), which defines an optimal controller of a desired size. This representation allows a wide range of powerful nonlinear programming (NLP) algorithms to be used to solve POMDPs. Simulation results for a set of benchmark problems illustrate the effectiveness of the proposed method. We show how this tool could be used to design a network security framework

    Pattern Recognition for Command and Control Data Systems

    Get PDF
    To analyze real-world events, researchers collect observation data from an underlying process and construct models to represent the observed situation. In this work, we consider issues that affect the construction and usage of a specific type of model. Markov models are commonly used because their combination of discrete states and stochastic transitions is suited to applications with both deterministic and stochastic components. Hidden Markov Models (HMMs) are a class of Markov model commonly used in pattern recognition. We first demonstrate how to construct HMMs using only the observation data, and no a priori information, by extending a previously developed approach from J.P. Crutchfield and C.R. Shalizi. We also show how to determine with a level of statistical confidence whether or not the model fully encapsulates the underlying process. Once models are constructed from observation data, the models are used to identify other types of observations. Traditional approaches consider the maximum likelihood that the model matches the observation, solving a classification problem. We present a new method using confidence intervals and receiver operating characteristic curves. Our method solves a detection problem by determining if observation data matches zero, one, or more than one model. To detect the occurrence of a behavior in observation data, one must consider the amount of data required. We consider behaviors to be \u27serial Markovian,\u27 when the behavior can change from one model to another at any time. When analyzing observation data, considering too much data induces high delay and could lead to confusion in the system if multiple behaviors are observed in the data stream. If too little data is used, the system has a high false positive rate and is unable to correctly detect behaviors. We demonstrate the effectiveness of all methods using illustrative examples and consumer behavior data

    Multi-classifier systems for off-line signature verification

    Get PDF
    Handwritten signatures are behavioural biometric traits that are known to incorporate a considerable amount of intra-class variability. The Hidden Markov Model (HMM) has been successfully employed in many off-line signature verification (SV) systems due to the sequential nature and variable size of the signature data. In particular, the left-to-right topology of HMMs is well adapted to the dynamic characteristics of occidental handwriting, in which the hand movements are always from left to right. As with most generative classifiers, HMMs require a considerable amount of training data to achieve a high level of generalization performance. Unfortunately, the number of signature samples available to train an off-line SV system is very limited in practice. Moreover, only random forgeries are employed to train the system, which must in turn to discriminate between genuine samples and random, simple and skilled forgeries during operations. These last two forgery types are not available during the training phase. The approaches proposed in this Thesis employ the concept of multi-classifier systems (MCS) based on HMMs to learn signatures at several levels of perception. By extracting a high number of features, a pool of diversified classifiers can be generated using random subspaces, which overcomes the problem of having a limited amount of training data. Based on the multi-hypotheses principle, a new approach for combining classifiers in the ROC space is proposed. A technique to repair concavities in ROC curves allows for overcoming the problem of having a limited amount of genuine samples, and, especially, for evaluating performance of biometric systems more accurately. A second important contribution is the proposal of a hybrid generative-discriminative classification architecture. The use of HMMs as feature extractors in the generative stage followed by Support Vector Machines (SVMs) as classifiers in the discriminative stage allows for a better design not only of the genuine class, but also of the impostor class. Moreover, this approach provides a more robust learning than a traditional HMM-based approach when a limited amount of training data is available. The last contribution of this Thesis is the proposal of two new strategies for the dynamic selection (DS) of ensemble of classifiers. Experiments performed with the PUCPR and GPDS signature databases indicate that the proposed DS strategies achieve a higher level of performance in off-line SV than other reference DS and static selection (SS) strategies from literature
    corecore