27 research outputs found

    Evolving rules for document classification

    Get PDF
    We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

    Automated Design of Network Security Metrics

    Get PDF
    Many abstract security measurements are based on characteristics of a graph that represents the network. These are typically simple and quick to compute but are often of little practical use in making real-world predictions. Practical network security is often measured using simulation or real-world exercises. These approaches better represent realistic outcomes but can be costly and time-consuming. This work aims to combine the strengths of these two approaches, developing efficient heuristics that accurately predict attack success. Hyper-heuristic machine learning techniques, trained on network attack simulation training data, are used to produce novel graph-based security metrics. These low-cost metrics serve as an approximation for simulation when measuring network security in real time. The approach is tested and verified using a simulation based on activity from an actual large enterprise network. The results demonstrate the potential of using hyper-heuristic techniques to rapidly evolve and react to emerging cybersecurity threats

    N-gram analysis of 970 microbial organisms reveals presence of biological language models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been suggested previously that genome and proteome sequences show characteristics typical of natural-language texts such as "signature-style" word usage indicative of authors or topics, and that the algorithms originally developed for natural language processing may therefore be applied to genome sequences to draw biologically relevant conclusions. Following this approach of 'biological language modeling', statistical n-gram analysis has been applied for comparative analysis of whole proteome sequences of 44 organisms. It has been shown that a few particular amino acid n-grams are found in abundance in one organism but occurring very rarely in other organisms, thereby serving as genome signatures. At that time proteomes of only 44 organisms were available, thereby limiting the generalization of this hypothesis. Today nearly 1,000 genome sequences and corresponding translated sequences are available, making it feasible to test the existence of biological language models over the evolutionary tree.</p> <p>Results</p> <p>We studied whole proteome sequences of 970 microbial organisms using n-gram frequencies and cross-perplexity employing the Biological Language Modeling Toolkit and Patternix Revelio toolkit. Genus-specific signatures were observed even in a simple unigram distribution. By taking statistical n-gram model of one organism as reference and computing cross-perplexity of all other microbial proteomes with it, cross-perplexity was found to be predictive of branch distance of the phylogenetic tree. For example, a 4-gram model from proteome of <it>Shigellae flexneri 2a</it>, which belongs to the <it>Gammaproteobacteria </it>class showed a self-perplexity of 15.34 while the cross-perplexity of other organisms was in the range of 15.59 to 29.5 and was proportional to their branching distance in the evolutionary tree from <it>S. flexneri</it>. The organisms of this genus, which happen to be pathotypes of <it>E.coli</it>, also have the closest perplexity values with <it>E. coli.</it></p> <p>Conclusion</p> <p>Whole proteome sequences of microbial organisms have been shown to contain particular n-gram sequences in abundance in one organism but occurring very rarely in other organisms, thereby serving as proteome signatures. Further it has also been shown that perplexity, a statistical measure of similarity of n-gram composition, can be used to predict evolutionary distance within a genus in the phylogenetic tree.</p

    Cyber Security Research Frameworks For Coevolutionary Network Defense

    No full text

    Adaptive Information Filtering: improvement of the matching technique and derivation of the evolutionary algorithm

    Get PDF
    Adaptive Information Filtering is concerned with filtering information streams in changing environments. The changes may occur both on the transmission side (the nature of the streams can change) and on the reception side (the interests of a user can change). The research described in this report details the progress made in a prototype Adaptive Information Filtering system based on weighted trigram analysis and evolutionary computation. The main improvements of the algorithms employed by the system concern the computation of the distance between weighted trigram vectors and a further analysis of the two-pool evolutionary algorithm. We tested our new prototype system on the Reuters-21578 text categorization test collection. Contents 1 Introduction 2 2 Overview of the AIF system 4 3 Measuring distance in weighted trigram frequency vector space 4 4 Applying evolutionary computation to classification 8 4.1 Expanding object set . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Shifti..

    Evolving Multi-Level Graph Partitioning Algorithms

    No full text
    Optimal graph partitioning is a foundational problem in computer science, and appears in many different applications. Multi-level graph partitioning is a state-of-the-art method of efficiently approximating high quality graph partitions. In this work, genetic programming techniques are used to evolve new multi-level graph partitioning heuristics that are tailored to specific applications. Results are presented using these evolved partitioners on traditional random graph models as well as a real-world computer network data set. These results demonstrate an improvement in the quality of the partitions produced over current state-of-the-art methods

    DCAFE: A Distributed Cyber Security Automation Framework for Experiments

    No full text
    Cyber security has quickly become an overwhelming challenge for governments, businesses, private organizations, and individuals. In an increasingly connected world, the trend is for resources to be accessible from anywhere at any time. Greater access to resources implies more targets and potentially a larger surface area for attacks, which makes securing systems more difficult. Automated and semi-automated solutions are needed to keep up with the deluge of modern threats, but designing such systems requires a distributed architecture to support development and testing. Several such architectures exist, but most only focus on providing a platform for running cyber security experiments as opposed to automating experiment processes. In response to this need, we have built a distributed framework based on software agents which can manage system roles, automate data collection, analyze results, and run new experiments without human intervention. The contribution of this work is the creation of a model for experiment automation and control in a distributed system environment, and this paper provides a detailed description of our framework based on that model

    Coevolutionary Agent-Based Network Defense Lightweight Event System (CANDLES)

    No full text
    Predicting an adversary\u27s capabilities, intentions, and probable vectors of attack is in general a complex and arduous task. Cyber space is particularly vulnerable to unforeseen attacks, as most computer networks have a large, complex, opaque attack surface area and are therefore extremely difficult to analyze. Abstract adversarial models which capture the pertinent features needed for analysis, can reduce the complexity sufficiently to make analysis feasible. Game theory allows for mathematical analysis of adversarial models; however, its scalability limitations restrict its use to simple, abstract models. Computational game theory is focused on scaling classical game theory to large, complex systems capable of modeling real-world environments; one promising approach is coevolution where each player\u27s \u27fitness is dependent on its adversaries. In this paper, we propose the Coevolutionary Agent-based Network Defense Lightweight Event System (CANDLES), a framework designed to coevolve attacker and defender agent strategies and evaluate potential solutions with a custom, abstract computer network defense simulation. By performing a qualitative analysis of the result data, we provide a proof of concept for the applicability of coevolution in planning for, and defending against, novel attacker strategies in computer network security
    corecore