26 research outputs found

    A Survey of Hashing Techniques for High Performance Computing

    Get PDF
    Hashing is a well-known and widely used technique for providing O(1) access to large files on secondary storage and tables in memory. Hashing techniques were introduced in the early 60s. The term hash function historically is used to denote a function that compresses a string of arbitrary input to a string of fixed length. Hashing finds applications in other fields such as fuzzy matching, error checking, authentication, cryptography, and networking. Hashing techniques have found application to provide faster access in routing tables, with the increase in the size of the routing tables. More recently, hashing has found applications in transactional memory in hardware. Motivated by these newly emerged applications of hashing, in this paper we present a survey of hashing techniques starting from traditional hashing methods with greater emphasis on the recent developments. We provide a brief explanation on hardware hashing and a brief introduction to transactional memory

    Mémoires associatives algorithmiques pou l'opération de recherche du plus long préfixe sur FPGA

    Get PDF
    RÉSUMÉ Les réseaux prédiffusés programmables — en anglais Field Programmable Gate Arrays (FPGAs)— sont omniprésents dans les centres de données, pour accélérer des tâches d’indexations et d’apprentissage machine, mais aussi plus récemment, pour accélérer des opérations réseaux. Dans cette thèse, nous nous intéressons à l’opération de recherche du plus long préfixe en anglais Longest Prefix Match (LPM) — sur FPGA. Cette opération est utilisée soit pour router des paquets, soit comme un bloc de base dans un plan de données programmable. Bien que l’opération LPM soit primordiale dans un réseau, celle-ci souffre d’inefficacité sur FPGA. Dans cette thèse, nous démontrons que la performance de l’opération LPM sur FPGA peut être substantiellement améliorée en utilisant une approche algorithmique, où l’opération LPM est implémentée à l’aide d’une structure de données. Par ailleurs, les résultats présentés permettent de réfléchir à une question plus large : est-ce que l’architecture des FPGA devrait être spécialisée pour les applications réseaux ? Premièrement, pour l’application de routage IPv6 dans le réseau Internet, nous présentons SHIP. Cette solution exploite les caractéristiques des préfixes pour construire une structure de données compacte, pouvant être implémentée de manière efficace sur FPGA. SHIP utilise l’approche ńdiviser pour régnerż pour séparer les préfixes en groupes de faible cardinalité et ayant des caractéristiques similaires. Les préfixes contenus dans chaque groupe sont en-suite encodés dans une structure de données hybride, où l’encodage des préfixes est adapté suivant leurs caractéristiques. Sur FPGA, SHIP augmente l’efficacité de l’opération LPM comparativement à l’état de l’art, tout en supportant un débit supérieur à 100 Gb/s. Deuxièment, nous présentons comment implémenter efficacement l’opération LPM pour un plan de données programmable sur FPGA. Dans ce cas, contrairement au routage de pa-quets, aucune connaissance à priori des préfixes ne peut être utilisée. Par conséquent, nous présentons un cadre de travail comprenant une structure de données efficace, indépendam-ment des caractéristiques des préfixes contenus, et des méthodes permettant d’implémenter efficacement la structure de données sur FPGA. Un arbre B, étendu pour l’opération LPM, est utilisé en raison de sa faible complexité algorithmique. Nous présentons une méthode pour allouer à la compilation le minimum de ressources requis par l’abre B pour encoder un ensemble de préfixes, indépendamment de leurs caractéristiques. Plusieurs méthodes sont ensuite présentées pour augmenter l’efficacité mémoire après implémentation de la structure de données sur FPGA. Évaluée sur plusieurs scénarios, cette solution est capable de traiter plus de 100 Gb/s, tout en améliorant la performance par rapport à l’état de l’art.----------ABSTRACT FPGAs are becoming ubiquitous in data centers. First introduced to accelerate indexing services and machine learning tasks, FPGAs are now also used to accelerate networking operations, including the LPM operation. This operation is used for packet routing and as a building block in programmable data planes. However, for the two uses cases considered, the LPM operation is inefficiently implemented in FPGAs. In this thesis, we demonstrate that the performance of LPM operation can be significantly improved using an algorithmic approach, where the LPM operation is implemented using a data structure. In addition, using the results presented in this thesis, we can answer a broader question: Should the FPGA architecture be specialized for networking? First, we present the SHIP data structure that is tailored to routing IPv6 packets in the Internet network. SHIP exploits the prefix characteristics to build a compact data structure that can be efficiently mapped to FPGAs. First, SHIP uses a "divide and conquer" approach to bin prefixes in groups with a small cardinality and sharing similar characteristics. Second, a hybrid-trie-tree data structure is used to encode the prefixes held in each group. The hybrid data structure adapts the prefix encoding method to their characteristics. Then, we demonstrated that SHIP can be efficiently implemented in FPGAs. Implemented on FPGAs, the proposed solution improves the memory efficiency over the state of the art solutions, while supporting a packet throughput greater than 100 Gbps.While the prefixes and their characteristics are known when routing packets in the Internet network, this is not true for programmable data planes. Hence, the second solution, designed for programmable data planes, does not exploit any prior knowledge of the prefix stored. We present a framework comprising an efficient data structure to encode the prefixes and methods to map the data structure efficiently to FPGAs. First, the framework leverages a B-tree, extended to support the LPM operation, for its low algorithmic complexity. Second, we present a method to allocate at compile time the minimum amount of resources that can be used by the B-tree. Third, our framework selects the B-tree parameters to increase the post-implementation memory efficiency and generates the corresponding hardware architecture. Implemented on FPGAs, this solution supports packet throughput greater than 100 Gbps, while improving the performance over the state of the art

    Data Structures and Algorithms for Scalable NDN Forwarding

    Get PDF
    Named Data Networking (NDN) is a recently proposed general-purpose network architecture that aims to address the limitations of the Internet Protocol (IP), while maintaining its strengths. NDN takes an information-centric approach, focusing on named data rather than computer addresses. In NDN, the content is identified by its name, and each NDN packet has a name that specifies the content it is fetching or delivering. Since there are no source and destination addresses in an NDN packet, it is forwarded based on a lookup of its name in the forwarding plane, which consists of the Forwarding Information Base (FIB), Pending Interest Table (PIT), and Content Store (CS). In addition, as an in-network caching element, a scalable Repository (Repo) design is needed to provide large-scale long-term content storage in NDN networks. Scalable NDN forwarding is a challenge. Compared to the well-understood approaches to IP forwarding, NDN forwarding performs lookups on packet names, which have variable and unbounded lengths, increasing the lookup complexity. The lookup tables are larger than in IP, requiring more memory space. Moreover, NDN forwarding has a read-write data plane, requiring per-packet updates at line rates. Designing and evaluating a scalable NDN forwarding node architecture is a major effort within the overall NDN research agenda. The goal of this dissertation is to demonstrate that scalable NDN forwarding is feasible with the proposed data structures and algorithms. First, we propose a FIB lookup design based on the binary search of hash tables that provides a reliable longest name prefix lookup performance baseline for future NDN research. We have demonstrated 10 Gbps forwarding throughput with 256-byte packets and one billion synthetic forwarding rules, each containing up to seven name components. Second, we explore data structures and algorithms to optimize the FIB design based on the specific characteristics of real-world forwarding datasets. Third, we propose a fingerprint-only PIT design that reduces the memory requirements in the core routers. Lastly, we discuss the Content Store design issues and demonstrate that the NDN Repo implementation can leverage many of the existing databases and storage systems to improve performance

    A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

    Full text link
    With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on 2021-01-2

    Linux XIA: an interoperable meta network architecture to crowdsource the future Internet

    Full text link
    With the growing number of proposed clean-slate redesigns of the Internet, the need for a medium that enables all stakeholders to participate in the realization, evaluation, and selection of these designs is increasing. We believe that the missing catalyst is a meta network architecture that welcomes most, if not all, clean-state designs on a level playing field, lowers deployment barriers, and leaves the final evaluation to the broader community. This paper presents Linux XIA, a native implementation of XIA [12] in the Linux kernel, as a candidate. We first describe Linux XIA in terms of its architectural realizations and algorithmic contributions. We then demonstrate how to port several distinct and unrelated network architectures onto Linux XIA. Finally, we provide a hybrid evaluation of Linux XIA at three levels of abstraction in terms of its ability to: evolve and foster interoperation of new architectures, embed disparate architectures inside the implementation’s framework, and maintain a comparable forwarding performance to that of the legacy TCP/IP implementation. Given this evaluation, we substantiate a previously unsupported claim of XIA: that it readily supports and enables network evolution, collaboration, and interoperability—traits we view as central to the success of any future Internet architecture.This research was supported by the National Science Foundation under awards CNS-1040800, CNS-1345307 and CNS-1347525

    A Computational Approach to Packet Classification

    Full text link
    Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die caches; however, they do not scale well with the number of rules. We present a novel approach, NuevoMatch, which improves the memory scaling of existing methods. A new data structure, Range Query Recursive Model Index (RQ-RMI), is the key component that enables NuevoMatch to replace most of the accesses to main memory with model inference computations. We describe an efficient training algorithm that guarantees the correctness of the RQ-RMI-based classification. The use of RQ-RMI allows the rules to be compressed into model weights that fit into the hardware cache. Further, it takes advantage of the growing support for fast neural network processing in modern CPUs, such as wide vector instructions, achieving a rate of tens of nanoseconds per lookup. Our evaluation using 500K multi-field rules from the standard ClassBench benchmark shows a geometric mean compression factor of 4.9x, 8x, and 82x, and average performance improvement of 2.4x, 2.6x, and 1.6x in throughput compared to CutSplit, NeuroCuts, and TupleMerge, all state-of-the-art algorithms.Comment: To appear in SIGCOMM 202

    Conception et évaluation des systèmes logiciels de classifications de paquets haute-performance

    Get PDF
    Packet classification consists of matching packet headers against a set of pre-defined rules, and performing the action(s) associated with the matched rule(s). As a key technology in the data-plane of network devices, packet classification has been widely deployed in many network applications and services, such as firewalling, load balancing, VPNs etc. Packet classification has been extensively studied in the past two decades. Traditional packet classification methods are usually based on specific hardware. With the development of data center networking, software-defined networking, and application-aware networking technology, packet classification methods based on multi/many processor platform are becoming a new research interest. In this dissertation, packet classification has been studied mainly in three aspects: algorithm design framework, rule-set features analysis and algorithm implementation and optimization. In the dissertation, we review multiple proposed algorithms and present a decision tree based algorithm design framework. The framework decomposes various existing packet classification algorithms into a combination of different types of “meta-methods”, revealing the connection between different algorithms. Based on this framework, we combine different “meta-methods” from different algorithms, and propose two new algorithms, HyperSplit-op and HiCuts-op. The experiment results show that HiCuts-op achieves 2~20x less memory size, and 10% less memory accesses than HiCuts, while HyperSplit-op achieves 2~200x less memory size, and 10%~30% less memory accesses than HyperSplit. We also explore the connections between the rule-set features and the performance of various algorithms. We find that the “coverage uniformity” of the rule-set has a significant impact on the classification speed, and the size of “orthogonal structure” rules usually determines the memory size of algorithms. Based on these two observations, we propose a memory consumption model and a quantified method for coverage uniformity. Using the two tools, we propose a new multi-decision tree algorithm, SmartSplit and an algorithm policy framework, AutoPC. Compared to EffiCuts algorithm, SmartSplit achieves around 2.9x speedup and up to 10x memory size reduction. For a given rule-set, AutoPC can automatically recommend a “right” algorithm for the rule-set. Compared to using a single algorithm on all the rulesets, AutoPC achieves in average 3.8 times faster. We also analyze the connection between prefix length and the update overhead for IP lookup algorithms. We observe that long prefixes will always result in more memory accesses using Tree Bitmap algorithm while short prefixes will always result in large update overhead in DIR-24-8. Through combining two algorithms, a hybrid algorithm, SplitLookup, is proposed to reduce the update overhead. Experimental results show that, the hybrid algorithm achieves 2 orders of magnitudes less in memory accesses when performing short prefixes updating, but its lookup speed with DIR-24-8 is close. In the dissertation, we implement and optimize multiple algorithms on the multi/many core platform. For IP lookup, we implement two typical algorithms: DIR-24-8 and Tree Bitmap, and present several optimization tricks for these two algorithms. For multi-dimensional packet classification, we have implemented HyperCuts/HiCuts and the variants of these two algorithms, such as Adaptive Binary Cuttings, EffiCuts, HiCuts-op and HyperSplit-op. The SplitLookup algorithm has achieved up to 40Gbps throughput on TILEPro64 many-core processor. The HiCuts-op and HyperSplit-op have achieved up to 10 to 20Gbps throughput on a single core of Intel processors. In general, our study reveals the connections between the algorithmic tricks and rule-set features. Results in this dissertation provide insight for new algorithm design and the guidelines for efficient algorithm implementation.La classification de paquets consiste à vérifier par rapport à un ensemble de règles prédéfinies le contenu des entêtes de paquets. Cette vérification permet d'appliquer à chaque paquet l'action adaptée en fonction de règles qu'il valide. La classification de paquets étant un élément clé du plan de données des équipements de traitements de paquets, elle est largement utilisée dans de nombreuses applications et services réseaux, comme les pare-feu, l'équilibrage de charge, les réseaux privés virtuels, etc. Au vu de son importance, la classification de paquet a été intensivement étudiée durant les vingt dernières années. La solution classique à ce problème a été l'utilisation de matériel dédiés et conçus pour cet usage. Néanmoins, l'émergence des centres de données, des réseaux définis en logiciel nécessite une flexibilité et un passage à l'échelle que les applications classiques ne nécessitaient pas. Afin de relever ces défis des plateformes de traitement multi-cœurs sont de plus en plus utilisés. Cette thèse étudie la classification de paquets suivant trois dimensions : la conception des algorithmes, les propriétés des règles de classification et la mise en place logicielle, matérielle et son optimisation. La thèse commence, par faire une rétrospective sur les diverses algorithmes fondés sur des arbres de décision développés pour résoudre le problème de classification de paquets. Nous proposons un cadre générique permettant de classifier ces différentes approches et de les décomposer en une séquence de « méta-méthodes ». Ce cadre nous a permis de monter la relation profonde qui existe ces différentes méthodes et en combinant de façon différentes celle-ci de construire deux nouveaux algorithmes de classification : HyperSplit-op et HiCuts-op. Nous montrons que ces deux algorithmes atteignent des gains de 2~200x en terme de taille de mémoire et 10%~30% moins d'accès mémoire que les meilleurs algorithmes existant. Ce cadre générique est obtenu grâce à l'analyse de la structure des ensembles de règles utilisés pour la classification des paquets. Cette analyse a permis de constater qu'une « couverture uniforme » dans l'ensemble de règle avait un impact significatif sur la vitesse de classification ainsi que l'existence de « structures orthogonales » avait un impact important sur la taille de la mémoire. Cette analyse nous a ainsi permis de développer un modèle de consommation mémoire qui permet de découper les ensembles de règles afin d'en construire les arbres de décision. Ce découpage permet jusqu'à un facteur de 2.9 d'augmentation de la vitesse de classification avec une réduction jusqu'à 10x de la mémoire occupé. La classification par ensemble de règle simple n'est pas le seul cas de classification de paquets. La recherche d'adresse IP par préfixe le plus long fourni un autre traitement de paquet stratégique à mettre en œuvre. Une troisième partie de cette thèse c'est donc intéressé à ce problème et plus particulièrement sur l'interaction entre la charge de mise à jour et la vitesse de classification. Nous avons observé que la mise à jour des préfixes longs demande plus d'accès mémoire que celle des préfixes court dans les structures de données d'arbre de champs de bits alors que l'inverse est vrai dans la structure de données DIR-24-8. En combinant ces deux approches, nous avons propose un algorithme hybride SplitLookup, qui nécessite deux ordres de grandeurs moins d'accès mémoire quand il met à jour les préfixes courts tout en gardant des performances de recherche de préfixe proche du DIR-24-8. Tous les algorithmes étudiés, conçus et implémentés dans cette thèse ont été optimisés à partir de nouvelles structures de données pour s'exécuter sur des plateformes multi-cœurs. Ainsi nous obtenons des débits de recherche de préfixe atteignant 40 Gbps sur une plateforme TILEPro64

    Deux défis des Réseaux Logiciels : Relayage par le Nom et Vérification des Tables

    Get PDF
    The Internet changed the lives of network users: not only it affects users' habits, but it is also increasingly being shaped by network users' behavior.Several new services have been introduced during the past decades (i.e. file sharing, video streaming, cloud computing) to meet users' expectation.As a consequence, although the Internet infrastructure provides a good best-effort service to exchange information in a point-to-point fashion, this is not the principal need that todays users request. Current networks necessitate some major architectural changes in order to follow the upcoming requirements, but the experience of the past decades shows that bringing new features to the existing infrastructure may be slow.In this thesis work, we identify two main aspects of the Internet evolution: a “behavioral” aspect, which refers to a change occurred in the way users interact with the network, and a “structural” aspect, related to the evolution problem from an architectural point of view.The behavioral perspective states that there is a mismatch between the usage of the network and the actual functions it provides. While network devices implement the simple primitives of sending and receiving generic packets, users are really interested in different primitives, such as retrieving or consuming content. The structural perspective suggests that the problem of the slow evolution of the Internet infrastructure lies in its architectural design, that has been shown to be hardly upgradeable.On the one hand, to encounter the new network usage, the research community proposed the Named-data networking paradigm (NDN), which brings the content-based functionalities to network devices.On the other hand Software-defined networking (SDN) can be adopted to simplify the architectural evolution and shorten the upgrade-time thanks to its centralized software control plane, at the cost of a higher network complexity that can easily introduce some bugs. SDN verification is a novel research direction aiming to check the consistency and safety of network configurations by providing formal or empirical validation.The talk consists of two parts. In the first part, we focus on the behavioral aspect by presenting the design and evaluation of “Caesar”, a content router that advances the state-of-the-art by implementing content-based functionalities which may coexist with real network environments.In the second part, we target network misconfiguration diagnosis, and we present a framework for the analysis of the network topology and forwarding tables, which can be used to detect the presence of a loop at real-time and in real network environments.Cette thèse aborde des problèmes liés à deux aspects majeurs de l’évolution d’Internet : l’aspect >, qui correspond aux nouvelles interactions entre les utilisateurs et le réseau, et l’aspect >, lié aux changements d’Internet d’un point de vue architectural.Le manuscrit est composé d’un chapitre introductif qui donne les grandes lignes de recherche de ce travail de thèse, suivi d’un chapitre consacré à la description de l’état de l’art sur les deux aspects mentionnés ci-dessus. Parmi les solutions proposées par la communauté scientifique pour s'adapter à l’évolution d’Internet, deux nouveaux paradigmes réseaux sont particulièrement décrits : Information- Centric Networking (ICN) et Software-Defined Networking (SDN).La thèse continue avec la proposition de >, un dispositif réseau, inspiré par ICN, capable de gérer la distribution de contenus à partir de primitives de routage basées sur le nom des données et non les adresses des serveurs. Caesar est présenté dans deux chapitres, qui décrivent l’architecture et deux des principaux modules : le relayage et la gestion de la traçabilité des requêtes.La suite du manuscrit décrit un outil mathématique pour la détection efficace de boucles dans un réseau SDN d’un point de vue théorique. Les améliorations de l’algorithme proposé par rapport à l’état de l’art sont discutées.La thèse se conclue par un résumé des principaux résultats obtenus et une présentation des travaux en cours et futurs

    Linux XIA: an interoperable meta network architecture

    Full text link
    With the growing number of clean-slate redesigns of the Internet, the need for a medium that enables all stakeholders to participate in the realization, evaluation, and selection of these designs is increasing. We believe that the missing catalyst is a meta network architecture that welcomes most, if not all, clean-state designs on a level playing field, lowers deployment barriers, and leaves the final evaluation to the broader community. This thesis presents the eXpressive Internet (Meta) Architecture (XIA), itself a clean-slate design, as well as Linux XIA, a native implementation of XIA in the Linux kernel, as a candidate. As a meta network architecture, XIA is highly flexible, leaving stakeholders to choose an expressive set of network principals to instantiate a given network architecture within the XIA framework. Central to XIA is its novel, non-linear network addressing format, from which derive key architectural features such as evolvability, intrinsically secure identifiers, and a low degree of principal isolation. XIP, the network layer protocol of XIA, forwards packets by navigating these structured addresses and delegating the decision-making and packet processing to appropriate principals, accordingly. Taken together, these mechanisms work in tandem to support a broad spectrum of interoperable principals. We demonstrate how to port four distinct and unrelated network architectures onto Linux XIA, none of which were designed for interoperability with this platform. We then show that, notwithstanding this flexibility, Linux XIA's forwarding performance remains comparable to that of the more mature legacy TCP/IP stack implementation. Moreover, the ported architectures, namely IP, Serval, NDN, and ANTS, empower us to present a deployment plan for XIA, to explore design variations of the ported architectures that were impossible in their original form due to the requirement of self-sufficiency that a standalone network architecture bears, and to substantiate the claim that XIA readily supports and enables network evolution. Our work highlights the benefits of specializing network designs that XIA affords, and comprises instructive examples for the network researcher interested in design and implementation for future interoperability
    corecore