14 research outputs found

    Algorithmes efficaces de gestion des règles dans les réseaux définis par logiciel

    Get PDF
    In software-defined networks (SDN), the filtering requirements for critical applications often vary according to flow changes and security policies. SDN addresses this issue with a flexible software abstraction, allowing simultaneous and convenient modification and implementation of a network policy on flow-based switches.With the increase in the number of entries in the ruleset and the size of data that traverses the network each second, it remains crucial to minimize the number of entries and accelerate the lookup process. On the other hand, attacks on Internet have reached a high level. The number keeps increasing, which increases the size of blacklists and the number of rules in firewalls. The limited storage capacity requires efficient management of that space. In the first part of this thesis, our primary goal is to find a simple representation of filtering rules that enables more compact rule tables and thus is easier to manage while keeping their semantics unchanged. The construction of rules should be obtained with reasonably efficient algorithms too. This new representation can add flexibility and efficiency in deploying security policies since the generated rules are easier to manage. A complementary approach to rule compression would be to use multiple smaller switch tables to enforce access-control policies in the network. However, most of them have a significant rules replication, or even they modify the packet's header to avoid matching a rule by a packet in the next switch. The second part of this thesis introduces new techniques to decompose and distribute filtering rule sets over a given network topology. We also introduce an update strategy to handle the changes in network policy and topology. In addition, we also exploit the structure of a series-parallel graph to efficiently resolve the rule placement problem for all-sized networks intractable time.Au sein des réseaux définis par logiciel (SDN), les exigences de filtrage pour les applications critiques varient souvent en fonction des changements de flux et des politiques de sécurité. SDN résout ce problème avec une abstraction logicielle flexible, permettant la modification et la mise en \oe{}uvre simultanées et pratiques d'une politique réseau sur les routeurs.Avec l'augmentation du nombre de règles de filtrage et la taille des données qui traversent le réseau chaque seconde, il est crucial de minimiser le nombre d'entrées et d'accélérer le processus de recherche. D'autre part, l'accroissement du nombre d'attaques sur Internet s'accompagne d'une augmentation de la taille des listes noires et du nombre de règles des pare-feux. Leur capacité de stockage limitée nécessite une gestion efficace de l'espace. Dans la première partie de cette thèse, nous proposons une représentation compacte des règles de filtrage tout en préservant leur sémantique. La construction de cette représentation est obtenue par des algorithmes raisonnablement efficaces. Cette technique permet flexibilité et efficacité dans le déploiement des politiques de sécurité puisque les règles engendrées sont plus faciles à gérer.Des approches complémentaires à la compression de règles consistent à décomposer et répartir les tables de règles, pour implémenter, par exemple, des politiques de contrôle d'accès distribué.Cependant, la plupart d'entre elles nécessitent une réplication importante de règles, voire la modification des en-têtes de paquets. La deuxième partie de cette thèse présente de nouvelles techniques pour décomposer et distribuer des ensembles de règles de filtrage sur une topologie de réseau donnée. Nous introduisons également une stratégie de mise à jour pour gérer les changements de politique et de topologie du réseau. De plus, nous exploitons également la structure de graphe série-parallèle pour résoudre efficacement le problème de placement de règles

    Conception et évaluation des systèmes logiciels de classifications de paquets haute-performance

    Get PDF
    Packet classification consists of matching packet headers against a set of pre-defined rules, and performing the action(s) associated with the matched rule(s). As a key technology in the data-plane of network devices, packet classification has been widely deployed in many network applications and services, such as firewalling, load balancing, VPNs etc. Packet classification has been extensively studied in the past two decades. Traditional packet classification methods are usually based on specific hardware. With the development of data center networking, software-defined networking, and application-aware networking technology, packet classification methods based on multi/many processor platform are becoming a new research interest. In this dissertation, packet classification has been studied mainly in three aspects: algorithm design framework, rule-set features analysis and algorithm implementation and optimization. In the dissertation, we review multiple proposed algorithms and present a decision tree based algorithm design framework. The framework decomposes various existing packet classification algorithms into a combination of different types of “meta-methods”, revealing the connection between different algorithms. Based on this framework, we combine different “meta-methods” from different algorithms, and propose two new algorithms, HyperSplit-op and HiCuts-op. The experiment results show that HiCuts-op achieves 2~20x less memory size, and 10% less memory accesses than HiCuts, while HyperSplit-op achieves 2~200x less memory size, and 10%~30% less memory accesses than HyperSplit. We also explore the connections between the rule-set features and the performance of various algorithms. We find that the “coverage uniformity” of the rule-set has a significant impact on the classification speed, and the size of “orthogonal structure” rules usually determines the memory size of algorithms. Based on these two observations, we propose a memory consumption model and a quantified method for coverage uniformity. Using the two tools, we propose a new multi-decision tree algorithm, SmartSplit and an algorithm policy framework, AutoPC. Compared to EffiCuts algorithm, SmartSplit achieves around 2.9x speedup and up to 10x memory size reduction. For a given rule-set, AutoPC can automatically recommend a “right” algorithm for the rule-set. Compared to using a single algorithm on all the rulesets, AutoPC achieves in average 3.8 times faster. We also analyze the connection between prefix length and the update overhead for IP lookup algorithms. We observe that long prefixes will always result in more memory accesses using Tree Bitmap algorithm while short prefixes will always result in large update overhead in DIR-24-8. Through combining two algorithms, a hybrid algorithm, SplitLookup, is proposed to reduce the update overhead. Experimental results show that, the hybrid algorithm achieves 2 orders of magnitudes less in memory accesses when performing short prefixes updating, but its lookup speed with DIR-24-8 is close. In the dissertation, we implement and optimize multiple algorithms on the multi/many core platform. For IP lookup, we implement two typical algorithms: DIR-24-8 and Tree Bitmap, and present several optimization tricks for these two algorithms. For multi-dimensional packet classification, we have implemented HyperCuts/HiCuts and the variants of these two algorithms, such as Adaptive Binary Cuttings, EffiCuts, HiCuts-op and HyperSplit-op. The SplitLookup algorithm has achieved up to 40Gbps throughput on TILEPro64 many-core processor. The HiCuts-op and HyperSplit-op have achieved up to 10 to 20Gbps throughput on a single core of Intel processors. In general, our study reveals the connections between the algorithmic tricks and rule-set features. Results in this dissertation provide insight for new algorithm design and the guidelines for efficient algorithm implementation.La classification de paquets consiste à vérifier par rapport à un ensemble de règles prédéfinies le contenu des entêtes de paquets. Cette vérification permet d'appliquer à chaque paquet l'action adaptée en fonction de règles qu'il valide. La classification de paquets étant un élément clé du plan de données des équipements de traitements de paquets, elle est largement utilisée dans de nombreuses applications et services réseaux, comme les pare-feu, l'équilibrage de charge, les réseaux privés virtuels, etc. Au vu de son importance, la classification de paquet a été intensivement étudiée durant les vingt dernières années. La solution classique à ce problème a été l'utilisation de matériel dédiés et conçus pour cet usage. Néanmoins, l'émergence des centres de données, des réseaux définis en logiciel nécessite une flexibilité et un passage à l'échelle que les applications classiques ne nécessitaient pas. Afin de relever ces défis des plateformes de traitement multi-cœurs sont de plus en plus utilisés. Cette thèse étudie la classification de paquets suivant trois dimensions : la conception des algorithmes, les propriétés des règles de classification et la mise en place logicielle, matérielle et son optimisation. La thèse commence, par faire une rétrospective sur les diverses algorithmes fondés sur des arbres de décision développés pour résoudre le problème de classification de paquets. Nous proposons un cadre générique permettant de classifier ces différentes approches et de les décomposer en une séquence de « méta-méthodes ». Ce cadre nous a permis de monter la relation profonde qui existe ces différentes méthodes et en combinant de façon différentes celle-ci de construire deux nouveaux algorithmes de classification : HyperSplit-op et HiCuts-op. Nous montrons que ces deux algorithmes atteignent des gains de 2~200x en terme de taille de mémoire et 10%~30% moins d'accès mémoire que les meilleurs algorithmes existant. Ce cadre générique est obtenu grâce à l'analyse de la structure des ensembles de règles utilisés pour la classification des paquets. Cette analyse a permis de constater qu'une « couverture uniforme » dans l'ensemble de règle avait un impact significatif sur la vitesse de classification ainsi que l'existence de « structures orthogonales » avait un impact important sur la taille de la mémoire. Cette analyse nous a ainsi permis de développer un modèle de consommation mémoire qui permet de découper les ensembles de règles afin d'en construire les arbres de décision. Ce découpage permet jusqu'à un facteur de 2.9 d'augmentation de la vitesse de classification avec une réduction jusqu'à 10x de la mémoire occupé. La classification par ensemble de règle simple n'est pas le seul cas de classification de paquets. La recherche d'adresse IP par préfixe le plus long fourni un autre traitement de paquet stratégique à mettre en œuvre. Une troisième partie de cette thèse c'est donc intéressé à ce problème et plus particulièrement sur l'interaction entre la charge de mise à jour et la vitesse de classification. Nous avons observé que la mise à jour des préfixes longs demande plus d'accès mémoire que celle des préfixes court dans les structures de données d'arbre de champs de bits alors que l'inverse est vrai dans la structure de données DIR-24-8. En combinant ces deux approches, nous avons propose un algorithme hybride SplitLookup, qui nécessite deux ordres de grandeurs moins d'accès mémoire quand il met à jour les préfixes courts tout en gardant des performances de recherche de préfixe proche du DIR-24-8. Tous les algorithmes étudiés, conçus et implémentés dans cette thèse ont été optimisés à partir de nouvelles structures de données pour s'exécuter sur des plateformes multi-cœurs. Ainsi nous obtenons des débits de recherche de préfixe atteignant 40 Gbps sur une plateforme TILEPro64

    A Ternary Unification Framework for optimizing TCAM-based packet classification systems

    Full text link

    Codes for Load Balancing in TCAMs: Size Analysis

    Full text link
    Traffic splitting is a required functionality in networks, for example for load balancing over paths or servers, or by the source's access restrictions. The capacities of the servers (or the number of users with particular access restrictions) determine the sizes of the parts into which traffic should be split. A recent approach implements traffic splitting within the ternary content addressable memory (TCAM), which is often available in switches. It is important to reduce the amount of memory allocated for this task since TCAMs are power consuming and are often also required for other tasks such as classification and routing. Recent works suggested algorithms to compute a smallest implementation of a given partition in the longest prefix match (LPM) model. In this paper we analyze properties of such minimal representations and prove lower and upper bounds on their size. The upper bounds hold for general TCAMs, and we also prove an additional lower-bound for general TCAMs. We also analyze the expected size of a representation, for uniformly random ordered partitions. We show that the expected representation size of a random partition is at least half the size for the worst-case partition, and is linear in the number of parts and in the logarithm of the size of the address space

    An algorithmic approach to OpenFlow ruleset transformation

    Get PDF
    In an ideal development cycle for an OpenFlow application, a developer designs a pipeline to suit their application's needs and installs rules to that pipeline. Their application will run on any OpenFlow switch, whether software or hardware based. A network operator deploying this application would assess their network's requirements and purchase OpenFlow hardware to meet these requirements; such as bandwidth, port density, and flow table size. In reality, this level of interoperability does not exist as many OpenFlow switches are built on a fixed-function pipeline. Fixed-function pipelines limit the matches and actions available to rules depending on the table, but in doing so make more efficient use of expensive hardware resources such as TCAM. This thesis investigates improving OpenFlow device interoperability by developing a method to rewrite existing rulesets to new complex fixed-function pipelines. Additionally, this thesis developed the tools to assess and verify the interoperability and equivalence of OpenFlow rulesets and pipelines. This thesis developed a library and tools for working with descriptions of fixed-function pipelines, specifically, the Table Type Pattern description. This library provides a method to check if an existing ruleset is compatible with a new pipeline. Additionally, this thesis designed and implemented a pragmatic approach to compare if the forwarding behaviour of two OpenFlow 1.3 rulesets is equivalent. Equivalence checking provides a tool to verify that an OpenFlow application rewritten to program a new pipeline maintains the correct forwarding behaviour. Finally, this thesis investigates the problem of algorithmically rewriting an existing OpenFlow ruleset, programmed by an existing application, to fit a different fixed-function pipeline. Solving this problem allows an OpenFlow application to be written once and run on any OpenFlow switch. This research aimed to solve this problem in a comprehensive manner that did not rely on the target pipeline supporting features such as OpenFlow metadata. This thesis developed and implemented a general method to convert an OpenFlow 1.3 to a complex constrained fixed-function

    Techniques for High Performance Matching

    Get PDF
    With the growth of big data application demands, improving high-performance computing (HPC) becomes an essential industry task. High-performance matching is a critical performance path for HPC communications because it significantly impacts computing performance and profoundly affects networking performance. This dissertation focuses on improving the high-performance matching in HPC networks to keep up with the increasingly heavy demands of evolving applications. This dissertation is tackling the matching problem from both the computational and network aspects. On the one hand, the Message Passing Interface (MPI) is a de facto standard for the communication of parallel processes in an HPC network [1]. MPI has delivered an excellent performance for running large-scale scientific applications in petascale systems. Along with the petascale system, the exascale system is evolving to run even larger applications where the computing job size increases dramatically. This trend enlarges the message queues and degrades the MPI message matching performance. With the increasing requirement of big data applications, MPI message matching is a critical performance path for HPC communications. On the other hand, with the blooming of network techniques and the fast-growing size of network applications, users are seeking more enhanced, secure, and various network services. In an HPC network, the HPC cluster comprises multiple interconnected nodes in a switched network. With the integration of software-defined networking (SDN) technology into the HPC network, both the computational and network resources can be allocated efficiently according to the applications’ requirements. Thus, SDN switches are deployed in HPC networks to support high-performance, differentiated network services and guarantee the diverse users’ needs, such as firewall, load balancing, and quality of service [2]. In an SDN switch, packet classification classifies incoming packets to flows according to the rules generated in the control plane, which is a switch’s core function. Therefore, packet classification becomes a critical performance path for the HPC network. First, this dissertation presents GenMatcher, a generic and software-only arbitrary matching framework for fast and efficient searches on packet classification. The goal is to represent arbitrary rules with efficient prefix-based tries. In order to generate efficient trie groupings and expansions to support all arbitrary rules, we propose a clustering-based grouping algorithm to group rules based upon their bit-level similarities. Our algorithm generates near-optimal trie groupings with low configuration times and provides significantly higher match throughput than prior techniques. Experiments with synthetic traffic show that our method can achieve a 58.9X speedup 1 compared to the baseline on a single-core processor under a given memory constraint [3]. Second, to further improve the GenMatcher performance, this dissertation proposes GenS- Matcher, an efficient Single Instruction Multiple Data (SIMD) and cache-friendly arbitrary match- ing framework. GenSMatcher adopts a trie node with a fixed high-fanout and a varying span for each node depending on the data distribution. The layout of the trie node leverage cache and modern processor features such as SIMD instructions. To support arbitrary matching, we interpret arbitrary rules into three fields: value, mask, and priority, and then propose the GenSMatcher extraction algorithm to process the wildcard bits to support randomly positioning wildcards in arbitrary rules. At last, we add an array of wildcard entries to the leaf entries, which stores the wildcard rules and guarantees matching results. Experiments show that GenSMatcher outperforms GenMatcher under a large scale of the ruleset and key set regarding search time, insert time, and memory cost. Specifically, with 5M rules, our method achieves a 2.7X speedup on search time, and the insertion time takes ∼ 7.3 seconds, gaining a 1.38X speedup; meanwhile, the memory cost reduction is up to 6.17X. Third, to guarantee MPI ordering feature and high-performance matching for big applications on MPI tag matching, this dissertation introduces a new hybrid data structure and match- ing mechanism to address the performance challenges, reducing the matching operation time in the posted receive queue (PRQ) and unexpected message queue (UMQ). The hybrid data structures are composed of tries and hash maps. We evaluate our mechanism on microbenchmarks and existing MPI applications with different numbers of processes. Experiments with synthetic message flow show that our method can achieve a 20X search time speedup compared to the single-core processor’s baseline. For the PICSARlite application, we integrated our Hybrid and Intel mechanism into the MPICH library and evaluated their performance on the Ada cluster of Texas A&M University, which has 793 general compute nodes. The experiment outcome shows that our proposed Hybrid mechanism can achieve up to 1.55X speedup compared to the MPICH library method

    Representation of Classification Functions by Head-Tail Expressions

    Get PDF
    九州工業大学博士学位論文 学位記番号:情工博甲第291号 学位授与年月日:平成26年3月25日1 Introduction||2 Preliminary||3 GeneratingPrefixSum-of-ProductsExpressionsforIntervalFunctions||4 Derivation ofHead-TailExpressions for Interval Functions||5 Head-Tail Expressions for Single-Field Classification Functions||6 Head-TailExpressions forMulti-FieldClassificationFunctions||7 Conclusion and Future Work||Acknowledgements||List of PublicationsPacket classification is used in various network applications such as firewalls, access control lists, and network address translators. This technology uses ternary content addressable memories (TCAMs) to perform high speed packet forwarding. However, TCAMs dissipate high power and their cost are high. Thus, reduction of TCAMs is crucial. First, this thesis derives the prefix sum-of-products expression (PreSOP) and the number of products in a PreSOP for an interval function. Second, it derives Ψ(n,τ p), the number of n-variable interval functions that can be represented with τp products. Finally, it shows that more than 99.9% of the n-variable interval functions can be represented with ?32 n ? 1? products when n is sufficiently large. These results are useful for fast PreSOP generator and for estimating the size of Ternary Content Addressable Memories (TCAMs) for packet classification. Second, this thesis shows a method to represent interval functions by using head-tail expressions. The head-tail expressions represent greater-than GT(n : A) functions, lessthan LT(n : B) functions, and interval functions IN0(n : A,B) more efficiently than sum-of-products expressions, where n denotes the number of bits to represent the largest value in the interval (A,B). This paper proves that a head-tail expression (HT) represents an interval function with at most n words in a ternary content addressable memory (TCAM) realization. It also shows the average numbers of factors to represent interval functions by HTs for up to n = 16, which were obtained by a computer simulation. It also conjectures that, for sufficiently large n, the average number of factors to represent n-variable interval functions by HTs is at most 23 n ? 59. Experimental results also show that, for n ? 10, to represent interval functions, HTs require at least 20% fewer factors than MSOPs, on the average. Third, this thesis presents a method to generate head-tail expressions for single-field classification functions. First, it introduces a fast prefix sum-of-product (PreSOP) generator (FP) which generates products using the bit patterns of the endpoints. Next, it shows a direct head-tail expression generator (DHT). Experimental results show that DHT generates much smaller TCAM than FP. The proposed algorithm is useful for simplified TCAM generator for packet classification. Finally, this thesis shows methods to simplify rules in TCAMs for packet classification. First method, it partitions the rules into groups so that each group has the same source address, destination address and protocol. After that, it implifies rules in each group by removing redundant rules. A computer program was developed to simplify rules among groups. Experimental results show that this method reduces the size of rules up to 57% of the original specification for ACL5 rules, 73% for ACL3 rules, and 87% for overall rules. This algorithm is useful to reduce TCAMs for packet classification. In the second method, we reduce the number of words in TCAM for multi-field classification functions by using head-tail expressions. It presents MFHT, an O(r2)-algorithm to generate simplified TCAMs for two-field classification functions, where r is the number of rules. Experimental results show that MFHT achieves a 58% reduction of words for random rules and a 52% reduction of words for ACL and FW rules. Moreover, MFHT is fast. The methods are useful for simplifying TCAM for packet classification

    Adaptive Conflict-Free Optimization of Rule Sets for Network Security Packet Filtering Devices

    Get PDF
    corecore