498 research outputs found

    Conception et Ă©valuation des systĂšmes logiciels de classifications de paquets haute-performance

    Get PDF
    Packet classification consists of matching packet headers against a set of pre-defined rules, and performing the action(s) associated with the matched rule(s). As a key technology in the data-plane of network devices, packet classification has been widely deployed in many network applications and services, such as firewalling, load balancing, VPNs etc. Packet classification has been extensively studied in the past two decades. Traditional packet classification methods are usually based on specific hardware. With the development of data center networking, software-defined networking, and application-aware networking technology, packet classification methods based on multi/many processor platform are becoming a new research interest. In this dissertation, packet classification has been studied mainly in three aspects: algorithm design framework, rule-set features analysis and algorithm implementation and optimization. In the dissertation, we review multiple proposed algorithms and present a decision tree based algorithm design framework. The framework decomposes various existing packet classification algorithms into a combination of different types of “meta-methods”, revealing the connection between different algorithms. Based on this framework, we combine different “meta-methods” from different algorithms, and propose two new algorithms, HyperSplit-op and HiCuts-op. The experiment results show that HiCuts-op achieves 2~20x less memory size, and 10% less memory accesses than HiCuts, while HyperSplit-op achieves 2~200x less memory size, and 10%~30% less memory accesses than HyperSplit. We also explore the connections between the rule-set features and the performance of various algorithms. We find that the “coverage uniformity” of the rule-set has a significant impact on the classification speed, and the size of “orthogonal structure” rules usually determines the memory size of algorithms. Based on these two observations, we propose a memory consumption model and a quantified method for coverage uniformity. Using the two tools, we propose a new multi-decision tree algorithm, SmartSplit and an algorithm policy framework, AutoPC. Compared to EffiCuts algorithm, SmartSplit achieves around 2.9x speedup and up to 10x memory size reduction. For a given rule-set, AutoPC can automatically recommend a “right” algorithm for the rule-set. Compared to using a single algorithm on all the rulesets, AutoPC achieves in average 3.8 times faster. We also analyze the connection between prefix length and the update overhead for IP lookup algorithms. We observe that long prefixes will always result in more memory accesses using Tree Bitmap algorithm while short prefixes will always result in large update overhead in DIR-24-8. Through combining two algorithms, a hybrid algorithm, SplitLookup, is proposed to reduce the update overhead. Experimental results show that, the hybrid algorithm achieves 2 orders of magnitudes less in memory accesses when performing short prefixes updating, but its lookup speed with DIR-24-8 is close. In the dissertation, we implement and optimize multiple algorithms on the multi/many core platform. For IP lookup, we implement two typical algorithms: DIR-24-8 and Tree Bitmap, and present several optimization tricks for these two algorithms. For multi-dimensional packet classification, we have implemented HyperCuts/HiCuts and the variants of these two algorithms, such as Adaptive Binary Cuttings, EffiCuts, HiCuts-op and HyperSplit-op. The SplitLookup algorithm has achieved up to 40Gbps throughput on TILEPro64 many-core processor. The HiCuts-op and HyperSplit-op have achieved up to 10 to 20Gbps throughput on a single core of Intel processors. In general, our study reveals the connections between the algorithmic tricks and rule-set features. Results in this dissertation provide insight for new algorithm design and the guidelines for efficient algorithm implementation.La classification de paquets consiste Ă  vĂ©rifier par rapport Ă  un ensemble de rĂšgles prĂ©dĂ©finies le contenu des entĂȘtes de paquets. Cette vĂ©rification permet d'appliquer Ă  chaque paquet l'action adaptĂ©e en fonction de rĂšgles qu'il valide. La classification de paquets Ă©tant un Ă©lĂ©ment clĂ© du plan de donnĂ©es des Ă©quipements de traitements de paquets, elle est largement utilisĂ©e dans de nombreuses applications et services rĂ©seaux, comme les pare-feu, l'Ă©quilibrage de charge, les rĂ©seaux privĂ©s virtuels, etc. Au vu de son importance, la classification de paquet a Ă©tĂ© intensivement Ă©tudiĂ©e durant les vingt derniĂšres annĂ©es. La solution classique Ă  ce problĂšme a Ă©tĂ© l'utilisation de matĂ©riel dĂ©diĂ©s et conçus pour cet usage. NĂ©anmoins, l'Ă©mergence des centres de donnĂ©es, des rĂ©seaux dĂ©finis en logiciel nĂ©cessite une flexibilitĂ© et un passage Ă  l'Ă©chelle que les applications classiques ne nĂ©cessitaient pas. Afin de relever ces dĂ©fis des plateformes de traitement multi-cƓurs sont de plus en plus utilisĂ©s. Cette thĂšse Ă©tudie la classification de paquets suivant trois dimensions : la conception des algorithmes, les propriĂ©tĂ©s des rĂšgles de classification et la mise en place logicielle, matĂ©rielle et son optimisation. La thĂšse commence, par faire une rĂ©trospective sur les diverses algorithmes fondĂ©s sur des arbres de dĂ©cision dĂ©veloppĂ©s pour rĂ©soudre le problĂšme de classification de paquets. Nous proposons un cadre gĂ©nĂ©rique permettant de classifier ces diffĂ©rentes approches et de les dĂ©composer en une sĂ©quence de « mĂ©ta-mĂ©thodes ». Ce cadre nous a permis de monter la relation profonde qui existe ces diffĂ©rentes mĂ©thodes et en combinant de façon diffĂ©rentes celle-ci de construire deux nouveaux algorithmes de classification : HyperSplit-op et HiCuts-op. Nous montrons que ces deux algorithmes atteignent des gains de 2~200x en terme de taille de mĂ©moire et 10%~30% moins d'accĂšs mĂ©moire que les meilleurs algorithmes existant. Ce cadre gĂ©nĂ©rique est obtenu grĂące Ă  l'analyse de la structure des ensembles de rĂšgles utilisĂ©s pour la classification des paquets. Cette analyse a permis de constater qu'une « couverture uniforme » dans l'ensemble de rĂšgle avait un impact significatif sur la vitesse de classification ainsi que l'existence de « structures orthogonales » avait un impact important sur la taille de la mĂ©moire. Cette analyse nous a ainsi permis de dĂ©velopper un modĂšle de consommation mĂ©moire qui permet de dĂ©couper les ensembles de rĂšgles afin d'en construire les arbres de dĂ©cision. Ce dĂ©coupage permet jusqu'Ă  un facteur de 2.9 d'augmentation de la vitesse de classification avec une rĂ©duction jusqu'Ă  10x de la mĂ©moire occupĂ©. La classification par ensemble de rĂšgle simple n'est pas le seul cas de classification de paquets. La recherche d'adresse IP par prĂ©fixe le plus long fourni un autre traitement de paquet stratĂ©gique Ă  mettre en Ɠuvre. Une troisiĂšme partie de cette thĂšse c'est donc intĂ©ressĂ© Ă  ce problĂšme et plus particuliĂšrement sur l'interaction entre la charge de mise Ă  jour et la vitesse de classification. Nous avons observĂ© que la mise Ă  jour des prĂ©fixes longs demande plus d'accĂšs mĂ©moire que celle des prĂ©fixes court dans les structures de donnĂ©es d'arbre de champs de bits alors que l'inverse est vrai dans la structure de donnĂ©es DIR-24-8. En combinant ces deux approches, nous avons propose un algorithme hybride SplitLookup, qui nĂ©cessite deux ordres de grandeurs moins d'accĂšs mĂ©moire quand il met Ă  jour les prĂ©fixes courts tout en gardant des performances de recherche de prĂ©fixe proche du DIR-24-8. Tous les algorithmes Ă©tudiĂ©s, conçus et implĂ©mentĂ©s dans cette thĂšse ont Ă©tĂ© optimisĂ©s Ă  partir de nouvelles structures de donnĂ©es pour s'exĂ©cuter sur des plateformes multi-cƓurs. Ainsi nous obtenons des dĂ©bits de recherche de prĂ©fixe atteignant 40 Gbps sur une plateforme TILEPro64

    Longest Prefix Match in High-Speed Networks

    Get PDF
    Tato prĂĄce se zabĂœvĂĄ vyhledĂĄvĂĄnĂ­m nejdelĆĄĂ­ho shodnĂ©ho prefixu (LPM), coĆŸ je časově kritickĂĄ operace pƙi směrovĂĄnĂ­ paketĆŻ. Pro dosaĆŸenĂ­ propustnosti 100Gbps je nutnĂĄ hardwarovĂĄ implementace tĂ©to operace a směrovacĂ­ tabulka musĂ­ bĂœt uloĆŸena v paměti na čipu, kterĂĄ je omezena nĂ­zkou kapacitou. SoučasnĂ© LPM algoritmy vyĆŸadujĂ­ velkĂ© mnoĆŸstvĂ­ paměti pro uloĆŸenĂ­ směrovacĂ­ch tabulek protokolu IPv6, nebo je nenĂ­ moĆŸno jednoduĆĄe implementovat v HW. Proto jsem se zaměƙil na analĂœzu směrovacĂ­ch tabulek IPv6 a několika znĂĄmĂœch LPM algoritmĆŻ. Na zĂĄkladě tĂ©to analĂœzy jsem navrhl novĂœ algoritmus, kterĂœ vynikĂĄ nĂ­zkou paměƄovou sloĆŸitostĂ­ pro IPv4/IPv6 vyhledĂĄvĂĄnĂ­. NavrĆŸenĂœ algoritmus mĂĄ nejniĆŸĆĄĂ­ paměƄovĂ© nĂĄroky v porovnĂĄnĂ­ s existujĂ­cĂ­mi LPM algoritmy. NavĂ­c je vhodnĂœ pro nasazenĂ­ ve vysokorychlostnĂ­ch 100Gbps sĂ­tĂ­ch, coĆŸ bylo ukĂĄzĂĄno s pomocĂ­ novĂ© hardwarovĂ© architektury vyuĆŸĂ­vajĂ­cĂ­ zƙetězenĂ© zpracovĂĄnĂ­ s propustnostĂ­ 140Gbps.This thesis deals with the Longest Prefix Matching (LPM), which is a time-critical operation in packet forwarding. To achieve 100Gbps throughput, this operation has to be implemented in hardware and a forwarding table has to fit into the on-chip memory, which is limited by its small size. Current LPM algorithms need large memory to store IPv6 forwarding tables or cannot be simply implemented in HW. Therefore we performed an analysis of available IPv6 forwarding tables and several LPM algorithms. Based on this analysis, we propose a new algorithm which is able to provide very low memory demands for IPv4/IPv6 lookups. To the best of our knowledge, the proposed algorithm has the lowest memory requirements in comparison to existing LPM algorithms. Moreover, the proposed algorithm is suitable for IP lookup in 100Gbps networks, which is shown on new pipelined hardware architecture with 140Gbps throughput.

    Compiling dataflow graphs into hardware

    Get PDF
    Department Head: L. Darrell Whitley.2005 Fall.Includes bibliographical references (pages 121-126).Conventional computers are programmed by supplying a sequence of instructions that perform the desired task. A reconfigurable processor is "programmed" by specifying the interconnections between hardware components, thereby creating a "hardwired" system to do the particular task. For some applications such as image processing, reconfigurable processors can produce dramatic execution speedups. However, programming a reconfigurable processor is essentially a hardware design discipline, making programming difficult for application programmers who are only familiar with software design techniques. To bridge this gap, a programming language, called SA-C (Single Assignment C, pronounced "sassy"), has been designed for programming reconfigurable processors. The process involves two main steps - first, the SA-C compiler analyzes the input source code and produces a hardware-independent intermediate representation of the program, called a dataflow graph (DFG). Secondly, this DFG is combined with hardware-specific information to create the final configuration. This dissertation describes the design and implementation of a system that performs the DFG to hardware translation. The DFG is broken up into three sections: the data generators, the inner loop body, and the data collectors. The second of these, the inner loop body, is used to create a computational structure that is unique for each program. The other two sections are implemented by using prebuilt modules, parameterized for the particular problem. Finally, a "glue module" is created to connect the various pieces into a complete interconnection specification. The dissertation also explores optimizations that can be applied while processing the DFG, to improve performance. A technique for pipelining the inner loop body is described that uses an estimation tool for the propagation delay of the nodes within the dataflow graph. A scheme is also described that identifies subgraphs with the dataflow graph that can be replaced with lookup tables. The lookup tables provide a faster implementation than random logic in some instances

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
    • 

    corecore