11 research outputs found

    Fast Packet Processing on High Performance Architectures

    Get PDF
    The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors

    In-NIC oyobi In-kernel kyasshu o mochiita maruchi reiya Key-value kyasshu ākitekucha

    Get PDF

    Data Structures and Algorithms for Scalable NDN Forwarding

    Get PDF
    Named Data Networking (NDN) is a recently proposed general-purpose network architecture that aims to address the limitations of the Internet Protocol (IP), while maintaining its strengths. NDN takes an information-centric approach, focusing on named data rather than computer addresses. In NDN, the content is identified by its name, and each NDN packet has a name that specifies the content it is fetching or delivering. Since there are no source and destination addresses in an NDN packet, it is forwarded based on a lookup of its name in the forwarding plane, which consists of the Forwarding Information Base (FIB), Pending Interest Table (PIT), and Content Store (CS). In addition, as an in-network caching element, a scalable Repository (Repo) design is needed to provide large-scale long-term content storage in NDN networks. Scalable NDN forwarding is a challenge. Compared to the well-understood approaches to IP forwarding, NDN forwarding performs lookups on packet names, which have variable and unbounded lengths, increasing the lookup complexity. The lookup tables are larger than in IP, requiring more memory space. Moreover, NDN forwarding has a read-write data plane, requiring per-packet updates at line rates. Designing and evaluating a scalable NDN forwarding node architecture is a major effort within the overall NDN research agenda. The goal of this dissertation is to demonstrate that scalable NDN forwarding is feasible with the proposed data structures and algorithms. First, we propose a FIB lookup design based on the binary search of hash tables that provides a reliable longest name prefix lookup performance baseline for future NDN research. We have demonstrated 10 Gbps forwarding throughput with 256-byte packets and one billion synthetic forwarding rules, each containing up to seven name components. Second, we explore data structures and algorithms to optimize the FIB design based on the specific characteristics of real-world forwarding datasets. Third, we propose a fingerprint-only PIT design that reduces the memory requirements in the core routers. Lastly, we discuss the Content Store design issues and demonstrate that the NDN Repo implementation can leverage many of the existing databases and storage systems to improve performance

    Stateful Data Plane Abstractions for Software-Defined Networks and Their Applications

    Get PDF
    RESUMÉ Le Software-Defined Networking (SDN) permet la programmation du rĂ©seau. Malheureusement, la technologie SDN actuelle limite la programmabilitĂ© uniquement au plan de contrĂŽle. Les opĂ©rateurs ne peuvent pas programmer des algorithmes du plan de donnĂ©es tels que l’équilibrage de charge, le contrĂŽle de congestion, la dĂ©tection de pannes, etc. Ces fonctions sont implĂ©mentĂ©es Ă  l’aide d’hardware dĂ©diĂ©, car elles doivent fonctionner au taux de ligne, c’est-Ă -dire 10-100 Gbit/s sur 10-100 ports. Dans ce travail, nous prĂ©sentons deux abstractions de plan de donnĂ©es pour le traitement de paquets Ă  Ă©tats (stateful), OpenState et OPP. OpenState est une extension d’OpenFlow qui permet la dĂ©finition des rĂšgles de flux en tant que machines Ă  Ă©tats finis. OPP est une abstraction plus flexible qui gĂ©nĂ©ralise OpenState en ajoutant des capacitĂ©s de calcul, permettant la programmation d’algorithmes de plan de donnĂ©es plus avancĂ©s. OpenState et OPP sont Ă  la fois disponibles pour les implĂ©mentations d’haute performance en utilisant des composants de commutateurs hardware courants. Cependant, les deux abstractions sont basĂ©es sur un choix de design problĂ©matique : l’utilisation d’une boucle de rĂ©troaction dans le pipeline de traitement des paquets. Cette boucle, si elle n’est pas correctement contrĂŽlĂ©e, peut nuire Ă  la cohĂ©rence des opĂ©rations d’état. Les approches de verrouillage de la mĂ©moire peuvent ĂȘtre utilisĂ©es pour Ă©viter les incohĂ©rences, au dĂ©triment du dĂ©bit. Nous prĂ©sentons des rĂ©sultats de simulations sur des traces de trafic rĂ©elles, montrant que les boucles de rĂ©troaction de plusieurs cycles d’horloge peuvent ĂȘtre supportĂ©es avec peu ou pas de dĂ©gradation des performances, mĂȘme avec les charges de travail des plus dĂ©favorables. Pour mieux prouver les avantages d’un plan de donnĂ©es programmables, nous prĂ©sentons deux nouvelles applications : Spider et FDPA. Spider permet de dĂ©tecter et de rĂ©agir aux pannes de rĂ©seau aux Ă©chelles temporelles du plan de donnĂ©es (i.e., micro/nanosecondes), Ă©galement dans le cas de pannes Ă  distance. En utilisant OpenState, Spider fournit des fonctionnalitĂ©s Ă©quivalentes aux protocoles de plans de contrĂŽle anciens tels que BFD et MPLS Fast Reroute, mais sans nĂ©cessiter un plan de contrĂŽle.---------- ABSTRACT Software-Defined Networking (SDN) enables programmability in the network. Unfortunately, current SDN limits programmability only to the control plane. Operators cannot program data plane algorithms such as load balancing, congestion control, failure detection, etc. These capabilities are usually baked in the switch via dedicated hardware, as they need to run at line rate, i.e. 10-100 Gbit/s on 10-100 ports. In this work, we present two data plane abstractions for stateful packet processing, namely OpenState and OPP. These abstractions allow operators to program data plane tasks that involve stateful processing. OpenState is an extension to OpenFlow that permits the definition of forwarding rules as finite state machines. OPP is a more flexible abstraction that generalizes OpenState by adding computational capabilities, opening for the programming of more advanced data plane algorithms. Both OpenState and OPP are amenable for highperformance hardware implementations by using commodity hardware switch components. However, both abstractions are based on a problematic design choice: to use a feedback-loop in the processing pipeline. This loop, if not adequately controlled, can represent a harm for the consistency of the state operations. Memory locking approaches can be used to prevent inconsistencies, at the expense of throughput. We present simulation results on real traffic traces showing that feedback-loops of several clock cycles can be supported with little or no performance degradation, even with near-worst case traffic workloads. To further prove the benefits of a stateful programmable data plane, we present two novel applications: Spider and FDPA. Spider permits to detect and react to network failures at data plane timescales, i.e. micro/nanoseconds, also in the case of distant failures. By using OpenState, Spider provides functionalities equivalent to legacy control plane protocols such as BFD and MPLS Fast Reroute, but without the need of a control plane. That is, both detection and rerouting happen entirely in the data plane. FDPA allows a switch to enforce approximate fair bandwidth sharing among many TCP-like senders. Most of the mechanisms to solve this problem are based on complex scheduling algorithms, whose feasibility becomes very expensive with today’s line rate requirements. FDPA, which is based on OPP, trades scheduling complexity with per-user state. FDPA works by dynamically assigning users to few (3-4) priority queues, where the priority is chosen based on the sending rate history of a user

    Hardware acceleration for power efficient deep packet inspection

    Get PDF
    The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance. DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI. The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs. In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching. Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future

    Reducing the Cost of Operating a Datacenter Network

    Get PDF
    Datacenters are a significant capital expense for many enterprises. Yet, they are difficult to manage and are hard to design and maintain. The initial design of a datacenter network tends to follow vendor guidelines, but subsequent upgrades and expansions to it are mostly ad hoc, with equipment being upgraded piecemeal after its amortization period runs out and equipment acquisition is tied to budget cycles rather than changes in workload. These networks are also brittle and inflexible. They tend to be manually managed, and cannot perform dynamic traffic engineering. The high-level goal of this dissertation is to reduce the total cost of owning a datacenter by improving its network. To achieve this, we make the following contributions. First, we develop an automated, theoretically well-founded approach to planning cost-effective datacenter upgrades and expansions. Second, we propose a scalable traffic management framework for datacenter networks. Together, we show that these contributions can significantly reduce the cost of operating a datacenter network. To design cost-effective network topologies, especially as the network expands over time, updated equipment must coexist with legacy equipment, which makes the network heterogeneous. However, heterogeneous high-performance network designs are not well understood. Our first step, therefore, is to develop the theory of heterogeneous Clos topologies. Using our theory, we propose an optimization framework, called LEGUP, which designs a heterogeneous Clos network to implement in a new or legacy datacenter. Although effective, LEGUP imposes a certain amount of structure on the network. To deal with situations when this is infeasible, our second contribution is a framework, called REWIRE, which using optimization to design unstructured DCN topologies. Our results indicate that these unstructured topologies have up to 100-500\% more bisection bandwidth than a fat-tree for the same dollar cost. Our third contribution is two frameworks for datacenter network traffic engineering. Because of the multiplicity of end-to-end paths in DCN fabrics, such as Clos networks and the topologies designed by REWIRE, careful traffic engineering is needed to maximize throughput. This requires timely detection of elephant flows---flows that carry large amount of data---and management of those flows. Previously proposed approaches incur high monitoring overheads, consume significant switch resources, or have long detection times. We make two proposals for elephant flow detection. First, in the Mahout framework, we suggest that such flows be detected by observing the end hosts' socket buffers, which provide efficient visibility of flow behavior. Second, in the DevoFlow framework, we add efficient stats-collection mechanisms to network switches. Using simulations and experiments, we show that these frameworks reduce traffic engineering overheads by at least an order of magnitude while still providing near-optimal performance

    Network-Compute Co-Design for Distributed In-Memory Computing

    Get PDF
    The booming popularity of online services is rapidly raising the demands for modern datacenters. In order to cope with data deluge, growing user bases, and tight quality of service constraints, service providers deploy massive datacenters with tens to hundreds of thousands of servers, keeping petabytes of latency-critical data memory resident. Such data distribution and the multi-tiered nature of the software used by feature-rich services results in frequent inter-server communication and remote memory access over the network. Hence, networking takes center stage in datacenters. In response to growing internal datacenter network traffic, networking technology is rapidly evolving. Lean user-level protocols, like RDMA, and high-performance fabrics have started making their appearance, dramatically reducing datacenter-wide network latency and offering unprecedented per-server bandwidth. At the same time, the end of Dennard scaling is grinding processor performance improvements to a halt. The net result is a growing mismatch between the per-server network and compute capabilities: it will soon be difficult for a server processor to utilize all of its available network bandwidth. Restoring balance between network and compute capabilities requires tighter co-design of the two. The network interface (NI) is of particular interest, as it lies on the boundary of network and compute. In this thesis, we focus on the design of an NI for a lightweight RDMA-like protocol and its full integration with modern manycore server processors. The NI capabilities scale with both the increasing network bandwidth and the growing number of cores on modern server processors. Leveraging our architecture's integrated NI logic, we introduce new functionality at the network endpoints that yields performance improvements for distributed systems. Such additions include new network operations with stronger semantics tailored to common application requirements and integrated logic for balancing network load across a modern processor's multiple cores. We make the case that exposing richer, end-to-end semantics to the NI is a unique enabler for optimizations that can reduce software complexity and remove significant load from the processor, contributing towards maintaining balance between the two valuable resources of network and compute. Overall, network-compute co-design is an approach that addresses challenges associated with the emerging technological mismatch of compute and networking capabilities, yielding significant performance improvements for distributed memory systems

    Model-driven Security Engineering for FPGAs

    Get PDF
    Tato prĂĄce obsahuje analĂœzu a adaptaci vhodnĂœch metod zabezpečenĂ­, pochĂĄzejĂ­cĂ­ch ze softwarovĂ© domĂ©ny, do světa FPGA. Metoda formalizace bezpečnostnĂ­ vĂœzvy FPGA je prezentovĂĄna jazykem FPGASECML, specifickĂœm pro danou domĂ©nu, vhodnĂœm pro modelovĂĄnĂ­ hrozeb zaměƙenĂœch na systĂ©m a pro formĂĄlnĂ­ definovĂĄnĂ­ bezpečnostnĂ­ politiky. VytvoƙenĂ­ vhodnĂœch obrannĂœch mechanismĆŻ vyĆŸaduje inteligenci o agentech ohroĆŸenĂ­, zejmĂ©na o jejich motivaci a schopnostech. Konstrukce zaloĆŸenĂ© na FPGA jsou, stejně jako jakĂœkoli jinĂœ IT systĂ©m, vystaveny rĆŻznĂœm agentĆŻm hrozeb po celou dobu jejich ĆŸivotnosti, coĆŸ nalĂ©havě vyĆŸaduje potƙebu vhodnĂ© a pƙizpĆŻsobitelnĂ© bezpečnostnĂ­ strategie. SystematickĂĄ analĂœza nĂĄvrhu zaloĆŸenĂĄ na konceptu STRIDE poskytuje cennĂ© informace o hrozbĂĄch a poĆŸadovanĂœch mechanismech protiopatƙenĂ­. Minimalizace povrchu Ăștoku je jednĂ­m z nezbytnĂœch krokĆŻ k vytvoƙenĂ­ odolnĂ©ho designu. KonvenčnĂ­ paradigmata ƙízenĂ­ pƙístupu mohou modelovat pravidla ƙízenĂ­ pƙístupu v nĂĄvrzĂ­ch FPGA. VĂœběr vhodnĂ©ho zĂĄvisĂ­ na sloĆŸitosti a bezpečnostnĂ­ch poĆŸadavcĂ­ch nĂĄvrhu. FormĂĄlnĂ­ popis architektury FPGA a bezpečnostnĂ­ politiky podporuje pƙesnou definici aktiv a jejich moĆŸnĂœch, povolenĂœch a zakĂĄzanĂœch interakcĂ­. Odstraƈuje nejednoznačnost z modelu hrozby a zĂĄroveƈ poskytuje plĂĄn implementace. Kontrola modelu mĆŻĆŸe bĂœt pouĆŸita k ověƙenĂ­, zda a do jakĂ© mĂ­ry, je nĂĄvrh v souladu s uvedenou bezpečnostnĂ­ politikou. PƙenesenĂ­ architektury do vhodnĂ©ho modelu a bezpečnostnĂ­ politiky do ověƙitelnĂœch logickĂœch vlastnostĂ­ mĆŻĆŸe bĂœt, jak je uvedeno v tĂ©to prĂĄci, automatizovanĂ©, zjednoduĆĄujĂ­cĂ­ proces a zmĂ­rƈujĂ­cĂ­ jeden zdroj chyb. PosĂ­lenĂ­ učenĂ­ mĆŻĆŸe identifikovat potenciĂĄlnĂ­ slabiny a kroky, kterĂ© mĆŻĆŸe ĂștočnĂ­k podniknout, aby je vyuĆŸil. NěkterĂ© metody zde uvedenĂ© mohou bĂœt pouĆŸitelnĂ© takĂ© v jinĂœch domĂ©nĂĄch.ObhĂĄjenoThe thesis provides an analysis and adaptation of appropriate security methods from the software domain into the FPGA world and combines them with formal verification methods and machine learning techniques. The deployment of appropriate defense mechanisms requires intelligence about the threat agents, especially their motivation and capabilities. FPGA based designs are, like any other IT system, exposed to different threat agents throughout the systems lifetime, urging the need for a suitable and adaptable security strategy. The systematic analysis of the design, based on the STRIDE concept, provides valuable insight into the threats and the mandated counter mechanisms. Minimizing the attack surface is one essential step to create a resilient design. Conventional access control paradigms can model access control rules in FPGA designs and thereby restrict the exposure of sensitive elements to untrustworthy ones. A method to formalize the FPGA security challenge is presented. FPGASECML is a domain-specific language, suitable for dataflow-centric threat modeling as well as the formal definition of an enforceable security policy. The formal description of the FPGA architecture and the security policy promotes a precise definition of the assets and their possible, allowed, and prohibited interactions. Formalization removes ambiguity from the threat model while providing a blueprint for the implementation. Model transformations allow the application of dedicated and proven tools to answer specific questions while minimizing the workload for the user. Model-checking can be applied to verify if, and to a certain degree when, a design complies with the stated security policy. Transferring the architecture into a suitable model and the security policy into verifiable logic properties can be, as demonstrated in the thesis, automated, simplifying the process and mitigating one source of error. Reinforcement learning, a machine learning method, can identify potential weaknesses and the steps an attacker may take to exploit them. The approach presented uses a Markov Decision Process in combination with a Qlearning algorithm

    A Randomized Scheme for IP Lookup at Wire Speed on NetFPGA

    No full text
    Because of the rapid growth of both traffic and links capacity, the time budget to perform IP address lookup on a packet continues to decrease and lookup tables of routers unceasingly grow. Therefore, new lookup algorithms and new hardware platform are required to perform fast IP lookup. This paper presents a new scheme on top of the NetFPGA board which takes advantage of parallel queries made on perfect hash functions. Such functions are built by using a very compact and fast data structure called Blooming Trees, thus allowing the vast majority of memory accesses to involve small and fast on-chip memories only
    corecore