13 research outputs found

    Implémentations logicielle et matérielle de l'algorithme Aho-Corasick pour la détection d'intrusions

    Get PDF
    RÉSUMÉ Ce travail propose des mĂ©thodes et architectures efficaces pour l’implĂ©mentation de l’algorithme Aho-Corasick. Cet algorithme peut ĂȘtre utilisĂ© pour la recherche de chaĂźnes de caractĂšres dans un systĂšme de dĂ©tection d’intrusion, tels que Snort, pour les rĂ©seaux informatiques. Deux versions sont proposĂ©es, une version logicielle et une version matĂ©rielle. La premiĂšre version dĂ©veloppe une implĂ©mentation logicielle pour des processeurs Ă  usage gĂ©nĂ©ral. Pour cela, de nouvelles implĂ©mentations de l'algorithme tenant compte des ressources mĂ©moire et de l’exĂ©cution sĂ©quentielle des processeurs ont Ă©tĂ© proposĂ©es. La deuxiĂšme version dĂ©veloppe de nouvelles architectures de processeurs particularisĂ©s pour FPGA. Elles tiennent compte des ressources de calcul disponibles, des ressources mĂ©moire et du potentiel de parallĂ©lisation Ă  grain fin offert par le FPGA. De plus, une comparaison avec une version logicielle modifiĂ©e est effectuĂ©e. Dans les deux cas, les performances et les compromis pour la sĂ©lection de diffĂ©rentes structures de donnĂ©es de nƓuds en mĂ©moire ont Ă©tĂ© analysĂ©s. Une sĂ©lection de paramĂštres est proposĂ©e afin de maximiser la fonction objective de performance qui combine le nombre de cycles, la consommation mĂ©moire et la frĂ©quence d’horloge du systĂšme. Les paramĂštres permettent de dĂ©terminer lequel des deux ou des trois types de structures de donnĂ©es de nƓuds (selon la version) sera choisi pour chaque nƓud d’une machine Ă  Ă©tats. Lors de la validation, des scĂ©narios de test utilisant des donnĂ©es variĂ©es ont Ă©tĂ© utilisĂ©s afin de s'assurer du bon fonctionnement de l'algorithme. De plus, les contenus des rĂšgles de Snort 2.9.7 ont Ă©tĂ© utilisĂ©s. La machine Ă  Ă©tats a Ă©tĂ© construite avec environ 26×103 chaĂźnes de caractĂšres qui sont toutes extraites de ces rĂšgles. La machine Ă  Ă©tats contient environ 381×103 nƓuds. La contribution gĂ©nĂ©rale de ce mĂ©moire est de montrer qu’il est possible, Ă  travers l’exploration d’architectures, de sĂ©lectionner des paramĂštres afin d’obtenir un produit mĂ©moire × temps optimal. Pour ce qui est de la version logicielle, la consommation mĂ©moire diminue de 407 Mo Ă  21 Mo, ce qui correspond Ă  une diminution de mĂ©moire d’environ 20× par rapport au pire cas avec seulement un type de nƓud. Pour ce qui est de la version matĂ©rielle, la consommation mĂ©moire diminue de 11 Mo Ă  4 Mo, ce qui rĂ©sulte en une diminution de mĂ©moire d’environ 3× par rapport Ă  la version logicielle modifiĂ©e. Pour ce qui est du dĂ©bit, il augmente de 300 Mbps pour la version logicielle modifiĂ©e Ă  400 Mbps pour la version matĂ©rielle.----------ABSTRACT This work proposes effective methods and architectures for the implementation of the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Two versions are proposed, a software version and a hardware version. The first version develops a software implementation in C/C++ for general purpose processors. For this, new implementations of the algorithm, considering the memory resources and the processor’s sequential execution, are proposed. The second version develops an architecture in VHDL for a specialized processor on FPGA. For this, new architectures of the algorithm, considering the available computing resources, the memory resources and the inherent parallelism of FPGAs, are proposed. Furthermore, a comparison with a modified software version is performed. For both cases, we analyze the performance and cost trade-off from selecting different data structures of nodes in memory. A selection of parameters is used in order to maximize de performance objective function that combines the cycles count, the memory usage and the system’s frequency. The parameters determine which of two or three types of data structures of nodes (depending on the version) is selected for each node of the state machine. For the validation phase, test cases with diverse data are used in order to ensure that the algorithm operates properly. Furthermore, the Snort 2.9.7 rules are used. The state machine was built with around 26×103 patterns which are all extracted from these rules. The state machine is comprised of around 381×103 nodes. The main contribution of this work is to show that it is possible to choose parameters through architecture exploration, to obtain an optimal memory × time product. For the software version, the memory consumption is reduced from 407 MB to 21 MB, which results in a memory improvement of about 20× compared with the single node-type case. For the hardware version, the memory consumption is reduced from 11 MB to 4 MB, which results in a memory improvement of about 3× compared with the modified software version. For the throughput, it increases from 300 Mbps with the modified software version to 400 Mbps with the hardware version

    Energy Efficient Hardware Accelerators for Packet Classification and String Matching

    Get PDF
    This thesis focuses on the design of new algorithms and energy efficient high throughput hardware accelerators that implement packet classification and fixed string matching. These computationally heavy and memory intensive tasks are used by networking equipment to inspect all packets at wire speed. The constant growth in Internet usage has made them increasingly difficult to implement at core network line speeds. Packet classification is used to sort packets into different flows by comparing their headers to a list of rules. A flow is used to decide a packet’s priority and the manner in which it is processed. Fixed string matching is used to inspect a packet’s payload to check if it contains any strings associated with known viruses, attacks or other harmful activities. The contributions of this thesis towards the area of packet classification are hardware accelerators that allow packet classification to be implemented at core network line speeds when classifying packets using rulesets containing tens of thousands of rules. The hardware accelerators use modified versions of the HyperCuts packet classification algorithm. An adaptive clocking unit is also presented that dynamically adjusts the clock speed of a packet classification hardware accelerator so that its processing capacity matches the processing needs of the network traffic. This keeps dynamic power consumption to a minimum. Contributions made towards the area of fixed string matching include a new algorithm that builds a state machine that is used to search for strings with the aid of default transition pointers. The use of default transition pointers keep memory consumption low, allowing state machines capable of searching for thousands of strings to be small enough to fit in the on-chip memory of devices such as FPGAs. A hardware accelerator is also presented that uses these state machines to search through the payloads of packets for strings at core network line speeds

    Computational Analysis of T Cell Receptor Repertoire and Structure

    Get PDF
    The human adaptive immune system has evolved to provide a sophisticated response to a vast body of pathogenic microbes and toxic substances. The primary mediators of this response are T and B lymphocytes. Antigenic peptides presented at the surface of infected cells by major histocompatibility complex (MHC) molecules are recognised by T cell receptors (TCRs) with exceptional specificity. This specificity arises from the enormous diversity in TCR sequence and structure generated through an imprecise process of somatic gene recombination that takes place during T cell development. Quantification of the TCR repertoire through the analysis of data produced by high-throughput RNA sequencing allows for a characterisation of the immune response to disease over time and between patients, and the development of methods for diagnosis and therapeutic design. The latest version of the software package Decombinator extracts and quantifies the TCR repertoire with improved accuracy and compatibility with complementary experimental protocols and external computational tools. The software has been extended for analysis of fragmented short-read data from single cells, comparing favourably with two alternative tools. The development of cell-based therapeutics and vaccines is incomplete without an understanding of molecular level interactions. The breadth of TCR diversity and cross-reactivity presents a barrier for comprehensive structural resolution of the repertoire by traditional means. Computational modelling of TCR structures and TCR-pMHC complexes provides an efficient alternative. Four generalpurpose protein-protein docking platforms were compared in their ability to accurately model TCR-pMHC complexes. Each platform was evaluated against an expanded benchmark of docking test cases and in the context of varying additional information about the binding interface. Continual innovation in structural modelling techniques sets the stage for novel automated tools for TCR design. A prototype platform has been developed, integrating structural modelling and an optimisation routine, to engineer desirable features into TCR and TCR-pMHC complex models

    T-cell receptor repertoire sequencing in health and disease

    Get PDF
    The adaptive immune systems of jawed vertebrates are based upon lymphocytes bearing a huge variety of antigen receptors. Produced by somatic DNA recombination, these receptors are clonally expressed on T- and B-lymphocytes, where they are used to help detect and control infections and help maintain regular bodily function. Full understanding of various aspects of the immune system relies upon accurate measurement of the individual receptors that make up these repertoires. In order to obtain such data, protocols were developed to permit unbiased amplification, high-throughput deep-sequencing, and error-correcting bioinformatic analysis of T-cell receptor sequences. These techniques have been applied to peripheral blood samples to further characterise aspects of the TCR repertoire of healthy individuals, such as V(D)J TCR gene usage and pairing distributions. A large number of sequences are also found to be shared across multiple individuals, including sequences matching receptors belonging to known and proposed T-cell subsets making use of invariant rearrangements. The resolution provided also permitted detection of low-frequency recombination events that use unexpected gene segments, or contained alternative splicing events. Deep-sequencing was further used to study the effect of HIV infection, and subsequent antiretroviral therapy, upon the TCR repertoire. HIV-patient repertoires are typified by marked clonal inequality and perturbed population structures, relative to healthy controls. The data presented support a model in which HIV infection drives expansion of an subset of CD8+ clones, which -- in combination with the virally-mediated loss of CD4+ cells -- is responsible for driving repertoires towards an idiosyncratic population with low diversity. Moreover these altered repertoire features do not significantly recover after three months of therapy. Deep-sequencing therefore presents opportunities to investigate the properties of TCR repertoires both in health and disease, which could be useful when analysing a wide variety of immune phenomena

    Computational approaches to the analysis of the T cell receptor repertoire

    Get PDF
    The T cell receptor (TCR) repertoire has the potential to be a highly personalised biomarker of historic or current immune challenges, and may hold clinically relevant information. This thesis reviews aspects of the measurement and analysis of the TCR repertoire, including approaches to obtaining high-throughput sequencing data and using these data to investigate features of the repertoire in health and disease. The thesis then considers three topics related to computational and experimental analysis of the TCR repertoire. First, this thesis explores a technical challenge in obtaining accurate quantitative TCR repertoire sequence data, observing substantial heterogeneity in the PCR amplification step essential for most current high-throughput sequencing protocols. An important conclusion of this chapter is that single molecule barcoding before amplification is essential to obtain robust quantification of clone abundances from sequence data. The second chapter considers the challenges of producing an effective TCR repertoire which can provide broad coverage of potential pathogens while maintaining tolerance to self-peptides. A computational model is explored which incorporates a linear programming representation of peripheral tolerance, with dendritic cells acting as the central agents reshaping the T cell population. The model is shown to maintain a population with restricted responsiveness to self-peptides while retaining a diverse and cross-reactive repertoire. In the final results chapter, TCR repertoire data from immunised mice is used to demonstrate that within a simplified animal model of immune response, the antigen responsive CDR3ÎČs are almost completely private. However, exploration of the protein sequences of the antigen associated CDR3ÎČs suggests that there may be amino acid motifs defining the antigen response. Overall, this thesis demonstrates the application of computational and modelling approaches to address questions regarding the TCR repertoire, facilitating interpretation of high-throughput sequencing data and providing insight into maintenance of diversity in the peripheral T cell population

    Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS'09)

    Get PDF
    The Symposium on Theoretical Aspects of Computer Science (STACS) is held alternately in France and in Germany. The conference of February 26-28, 2009, held in Freiburg, is the 26th in this series. Previous meetings took place in Paris (1984), Saarbršucken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), Wšurzburg (1993), Caen (1994), Mšunchen (1995), Grenoble (1996), Lšubeck (1997), Paris (1998), Trier (1999), Lille (2000), Dresden (2001), Antibes (2002), Berlin (2003), Montpellier (2004), Stuttgart (2005), Marseille (2006), Aachen (2007), and Bordeaux (2008). ..
    corecore