806 research outputs found

    Techniques for Processing TCP/IP Flow Content in Network Switches at Gigabit Line Rates

    Get PDF
    The growth of the Internet has enabled it to become a critical component used by businesses, governments and individuals. While most of the traffic on the Internet is legitimate, a proportion of the traffic includes worms, computer viruses, network intrusions, computer espionage, security breaches and illegal behavior. This rogue traffic causes computer and network outages, reduces network throughput, and costs governments and companies billions of dollars each year. This dissertation investigates the problems associated with TCP stream processing in high-speed networks. It describes an architecture that simplifies the processing of TCP data streams in these environments and presents a hardware circuit capable of TCP stream processing on multi-gigabit networks for millions of simultaneous network connections. Live Internet traffic is analyzed using this new TCP processing circuit

    Projeto, implementação e avaliação do suporte de casamento com prefixo mais longo para IPv4/IPv6 em planos de dados programáveis multi-arquitetura

    Get PDF
    Orientador: Christian Rodolfo Esteve RothenbergDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Dentre as novas tendências em programação de dataplane dentro de SDN (Software Defined Networking) destacam-se os esforços para prover um suporte multi-plataforma dotado de alta definição das informações que são processadas pelo pipeline do plano de dados. No entanto, alguns desafios ainda persistem, como a necessidade de um plano de dados programável ou a adoção de uma abstração de programação independente de protocolo. Como forma de mitigar tais problemas, verifica-se que a Linguagem Específica de Domínio~(DSL) Programming Protocol-Independent Packet Processors~(P4) desponta como uma tendência emergente para expressar como os pacotes são processados pelo plano de dados de uma plataforma de rede programável. De modo independente e em paralelo, constata-se que o projeto OpenDataPlane~(ODP) cria um conjunto de plataformas abertas de Application Programming Interfaces~(APIs) projetado para o plano de dados de rede. Isso posto, tem-se que o Multi-Architecture Compiler System for Abstract Dataplanes~(MACSAD) surge como uma abordagem para convergir P4 e ODP em um processo de compilação convencional, arquivando a portabilidade dos aplicativos de plano de dados sem afetar as melhorias de desempenho do alvo. O MACSAD pode integrar a API do ODP e o P4, reunindo-os e definindo um plano de dados programável em um sistema de compilador unificado. Este trabalho tem como objetivo adicionar o suporte do Longest Prefix Match~(LPM) do IPv4/IPv6 ao MACSAD, integrado com as APIs do ODP e à programação P4, oferecendo recursos de planejamento de dados de alto desempenho. O suporte ao LPM proposto para o MACSAD combina o algoritmo de lookup e a biblioteca da API do ODP com o suporte à tabela MACSAD, para criar uma base de encaminhamento completa usada no processo do LPM. A implementação do IPv4 adapta o atual algoritmo de lookup do ODP para trabalhar com o MACSAD. A implementação de lookup IPv6, atualmente não suportada pelo ODP, é uma extensão do suporte IPv4 que é desenvolvido usando o mesmo algoritmo adaptado a uma chave de 128 bits. A pesquisa IPv4 e IPv6 usa uma base de árvore binária para executar o lookup do LPM. Para a avaliação de desempenho do suporte ao LPM, utilizamos uma ferramenta geradora de tráfego Network Function Performance Analyzer~(NFPA) que permite gerar diferentes tipos de tráfego no MACSAD. Cabe ainda destacar, como uma contribuição lateral deste trabalho, o desenvolvimento da ferramenta geradora de pacote BB-Gen, já com lançamento open source. Resultados experimentais mostram que é possível atingir um throughput de 10G com tamanhos de pacotes de 512 bytes ou superioresAbstract: New trends in dataplane programmability inside Software Defined Networking~(SDN) are in efforts to bring multi-platform support with a high definition of the information that is processed by the dataplane pipeline. However, some challenges are still present, as the necessity of a programmable dataplane or a protocol independent programming abstraction. The Programming Protocol-Independent Packet Processors~(P4) Domain Specific Language (DSL) is an emerging trend to express how the packets are processed by the dataplane of a programmable network platform. In parallel, OpenDataPlane~(ODP) project creates an open-source, cross-platform set of Application Programming Interfaces~(APIs) designed for the networking data plane. Multi-Architecture Compiler System for Abstract Dataplanes~(MACSAD) is an approach to converge P4 and ODP in a conventional compilation process, achieving portability of the dataplane applications without affecting the target performance improvements. MACSAD can integrate the ODP API and the P4, bringing them together and defining a programmable dataplane across multiple targets in a unified compiler system. This work aims at adding IPv4/IPv6 Longest Prefix Match~(LPM) support to MACSAD integrated with ODP APIs and P4 programmability delivering high-performance dataplane capabilities. The proposed LPM support for MACSAD combines the lookup algorithm and the ODP API library with MACSAD table support, to create a complete forwarding base used in the LPM process. The IPv4 implementation adapts the current ODP lookup algorithm to work with MACSAD. IPv6 lookup implementation, currently not supported by ODP, is an extension of the IPv4 support, developed using the same algorithm adapted to a 128-bit key. IPv4 and IPv6 lookup use a binary tree base, to perform the LPM lookup. For the performance evaluation of the LPM support, we use a traffic generator tool Network Function Performance Analyzer~(NFPA) that allows generating different types of traffic across MACSAD. A side contribution on this front we developed and released open source the BB-Gen packet crafter tool. Experimental results show that it is possible to reach a throughput of 10G with packets sizes of 512 Bytes and aboveMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Compiler-driven data layout transformations for network applications

    Get PDF
    This work approaches the little studied topic of compiler optimisations directed to network applications. It starts by investigating if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compiler optimisations. It shows an automated approach that is capable of identifying domain-specific workload characterisations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarise key resource utilisation issues and enable compiler engineers to address the highlighted bottlenecks. By applying this methodology to data intensive network infrastructure application it shows that data organisation is the key obstacle to overcome in order to achieve high performance. It therefore proposes and evaluates three specialised data transformations (structure splitting, array regrouping, and software caching) against the industrial EEMBC networking benchmarks and real-world data sets. It also demonstrates on one hand that speedups of up to 2.62 can be achieved, but on the other that no single solution performs equally well across different network traffic scenarios. Hence, to address this issue, an adaptive software caching scheme for high frequency route lookup operations is introduced and its effectiveness evaluated one more time against EEMBC networking benchmarks and real-world data sets achieving speedups of up to 3.30 and 2.27. The results clearly demonstrate that adaptive data organisation schemes are necessary to ensure optimal performance under varying network loads. Finally this research addresses another issue introduced by data transformations such as array regrouping and software caching, i.e. the need for static analysis to allow efficient resource allocation. This thesis proposes a static code analyser that allows the automatic resource analysis of source code containing lists and tree structures. The tool applies a combination of amortised analysis and separation logic methodology to real code and is able to evaluate type and resource usage of existing data structures, which can be used to compute global resource consumption values for full data intensive network applications

    Beehive: an FPGA-based multiprocessor architecture

    Get PDF
    In recent years, to accomplish with the Moore's law hardware and software designers are tending progressively to focus their efforts on exploiting instruction-level parallelism. Software simulation has been essential for studying computer architecture because of its flexibility and low cost. However, users of software simulators must choose between high performance and high fidelity emulation. This project presents an FPGA-based multiprocessor architecture to speed up multiprocessor architecture research and ease parallel software simulation

    A null convention logic based platform for high speed low energy IP packet forwarding

    Get PDF
    By 2020, it is predicted that there will be over 5 billion people and 38.5 billion Internet-ofThings devices on the Internet. The data generated by all these users and devices will have to be transported quickly and efficiently. Routers forming the backbone of this Internet already support multiple 100 Gbps ports meaning that they would have to perform upwards of 200 Million destination addresses lookups per second in the packet forwarding block that lies in the router ‘data-path’. At the same time, there is also a huge demand to make the network infrastructure more energy efficient. The work presented in this thesis is motivated by the observation that traditional synchronous digital systems will have increasing difficulty keeping up with these conflicting demands. Further, with reducing device geometries, extremes in “process, voltage and temperature” (PVT) variability will undermine reliable synchronous operation. It is expected that asynchronous design techniques will be able to overcome many of these problems and offer a means of lowering energy while maintaining high throughput and low latency. This thesis investigates existing address lookup algorithms and investigates the possibility of combining various approaches to improve energy efficiency without affecting lookup performance. A quasi delay-insensitive asynchronous methodology - Null Convention Logic (NCL) - is then applied to this combined design. Techniques that take advantage of the characteristics of the design methodology and the lookup algorithm to further improve the area, energy and latency characteristics are also analysed. The IP address lookup scheme utilised here is a recent algorithmic approach that uses compact binary-tries and was selected for its high memory efficiency and throughput. The design is pipelined, and the prefix information is stored in large RAMs. A Boolean synchronous implementation of the algorithm is simulated to provide an initial performance benchmark. It is observed that during the address lookup process nearly 68% of the trie accesses are to nodes that contained no prefix information. Bloom filter structures that use non-cryptographic hashes and single-bit memory are introduced into the address lookup process to prevent these unnecessary accesses, thereby reducing the energy consumption. Three non-cryptographic hashing algorithms (CRC32, Jenkins and Murmur) are also analysed for their suitability in Bloom filters, and the CRC32 is found to offer the most suitable trade-off between complexity and performance. As a first step to applying the NCL design methodology, NCL implementations of the hashing algorithms are created and evaluated. A significant finding from these experiments is that, unlike Boolean systems, latency and throughput in NCL systems are only loosely coupled. An example Jenkins hash implementation with eight pipeline stages and a cycle time of 3.2 ns exhibits a total latency of 6 ns, whereas an equivalent synchronous implementation with a similar clock period exhibits a latency of 25.6 ns. Further investigations reveal that completion detection circuits within the NCL pipelines impair throughput significantly. Two enhancements to the NCL circuit library aimed particularly at optimising NCL completion detection are proposed and analysed. These are shown to enable completion detection circuits to be built with the same delay but with 30% smaller area and about 75% lower peak current compared to the conventional approach using gates from the standard NCL library. An NCL SRAM structure is also proposed to augment the conventional 6-T cell array with circuits to generate the handshaking signals for managing the NCL data flow. Additionally, a dedicated column of cells called the Null-storage column is added, which indicates if a particular address in the RAM stores no Data, i.e., it is in its Null state. This additional hardware imposes a small area overhead of about 10% but allows accesses to Null locations to be completed in 50% less time and consume 40% less energy than accesses to valid Data locations. An experimental NCL-based address lookup system is then designed that includes all of the developed NCL modules. Statistical delay models derived from circuit-level simulations of individual modules are used to emulate realistic circuit delay variability in the behavioural modules written in Verilog. Simulations of the assembled system demonstrate that unlike what was observed with the synchronous design, with NCL, the design that does not employ Bloom filters, but only the Null-storage column RAMs for prefix storage, exhibits the smallest area on the chip and also consumes the least energy per address lookup. It is concluded that to derive maximum benefit out of an asynchronous design approach; it is necessary to carefully select the architectural blocks that combine the peculiarities of the implemented algorithm with the capabilities of the NCL design methodology

    Algorithms and VLSI architectures for parametric additive synthesis

    Get PDF
    A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable

    Firewall Policy Diagram: Novel Data Structures and Algorithms for Modeling, Analysis, and Comprehension of Network Firewalls

    Get PDF
    Firewalls, network devices, and the access control lists that manage traffic are very important components of modern networking from a security and regulatory perspective. When computers were first connected, they were communicating with trusted peers and nefarious intentions were neither recognized nor important. However, as the reach of networks expanded, systems could no longer be certain whether the peer could be trusted or that their intentions were good. Therefore, a couple of decades ago, near the widespread adoption of the Internet, a new network device became a very important part of the landscape, i.e., the firewall with the access control list (ACL) router. These devices became the sentries to an organization's internal network, still allowing some communication; however, in a controlled and audited manner. It was during this time that the widespread expansion of the firewall spawned significant research into the science of deterministically controlling access, as fast as possible. However, the success of the firewall in securing the enterprise led to an ever increasing complexity in the firewall as the networks became more inter-connected. Over time, the complexity has continued to increase, yielding a difficulty in understanding the allowed access of a particular device. As a result of this success, firewalls are one of the most important devices used in network security. They provide the protection between networks that only wish to communicate over an explicit set of channels, expressed through the protocols, traveling over the network. These explicit channels are described and implemented in a firewall using a set of rules, where the firewall implements the will of the organization through these rules, also called a firewall policy. In small test environments and networks, firewall policies may be easy to comprehend and understand; however, in real world organizations these devices and policies must be capable of handling large amounts of traffic traversing hundreds or thousands of rules in a particular policy. Added to that complexity is the tendency of a policy to grow substantially more complex over time; and the result is often unintended mistakes in comprehending the complex policy, possibly leading to security breaches. Therefore, the need for an organization to unerringly and deterministically understand what traffic is allowed through a firewall, while being presented with hundreds or thousands of rules and routes, is imperative. In addition to the local security policy represented in a firewall, the modern firewall and filtering router involve more than simply deciding if a packet should pass through a security policy. Routing decisions through multiple network interfaces involving vendor-specific constructs such as zones, domains, virtual routing tables, and multiple security policies have become the more common type of device found in the industry today. In the past, network devices were separated by functional area (ACL, router, switch, etc.). The more recent trend has been for these capabilities to converge and blend creating a device that goes far beyond the straight-forward access control list. This dissertation investigates the comprehension of traffic flow through these complex devices by focusing on the following research topics: - Expands on how a security policy may be processed by decoupling the original rules from the policy, and instead allow a holistic understanding of the solution space being represented. This means taking a set of constraints on access (i.e., firewall rules), synthesizing them into a model that represents an accept and deny space that can be quickly and accurately analyzed. - Introduces a new set of data structures and algorithms collectively referred to as a Firewall Policy Diagram (FPD). A structure that is capable of modeling Internet Protocol version 4 packet (IPv4) solution space in memory efficient, mathematically set-based entities. Using the FPD we are capable of answering difficult questions such as: what access is allowed by one policy over another, what is the difference in spaces, and how to efficiently parse the data structure that represents the large search space. The search space can be as large as 288; representing the total values available to the source IP address (232), destination IP address (232), destination port (216), and protocol (28). The fields represent the available bits of an IPv4 packet as defined by the Open Systems Interconnection (OSI) model. Notably, only the header fields that are necessary for this research are taken into account and not every available IPv4 header value. - Presents a concise, precise, and descriptive language called Firewall Policy Query Language (FPQL) as a mechanism to explore the space. FPQL is a Backus Normal Form (Backus-Naur Form) (BNF) compatible notation for a query language to do just that sort of exploration. It looks to translate concise representations of what the end user needs to know about the solution space, and extract the information from the underlying data structures. - Finally, this dissertation presents a behavioral model of the capabilities found in firewall type devices and a process for taking vendor-specific nuances to a common implementation. This includes understanding interfaces, routes, rules, translation, and policies; and modeling them in a consistent manner such that the many different vendor implementations may be compared to each other

    Interconnect technologies for very large spiking neural networks

    Get PDF
    In the scope of this thesis, a neural event communication architecture has been developed for use in an accelerated neuromorphic computing system and with a packet-based high performance interconnection network. Existing neuromorphic computing systems mostly use highly customised interconnection networks, directly routing single spike events to their destination. In contrast, the approach of this thesis uses a general purpose packet-based interconnection network and accumulates multiple spike events at the source node into larger network packets destined to common destinations. This is required to optimise the payload efficiency, given relatively large packet headers as compared to the size of neural spike events. Theoretical considerations are made about the efficiency of different event aggregation strategies. Thereby, important factors are the number of occurring event network-destinations and their relative frequency, as well as the number of available accumulation buffers. Based on the concept of Markov Chains, an analytical method is developed and used to evaluate these aggregation strategies. Additionally, some of these strategies are stochastically simulated in order to verify the analytical method and evaluate them beyond its applicability. Based on the results of this analysis, an optimisation strategy is proposed for the mapping of neural populations onto interconnected neuromorphic chips, as well as the joint assignment of event network-destinations to a set of accumulation buffers. During this thesis, such an event communication architecture has been implemented on the communication FPGAs in the BrainScaleS-2 accelerated neuromorphic computing system. Thereby, its usability can be scaled beyond single chip setups. For this, the EXTOLL network technology is used to transport and route the aggregated neural event packets with high bandwidth and low latency. At the FPGA, a network bandwidth of up to 12 Gbit/s is usable at a maximum payload efficiency of 94 %. The latency has been measured in the scope of this thesis to a range between 1.6 μs and 2.3 μs across the network between two neuron circuits on separate chips. This latency is thereby mostly dominated by the path from the neuromorphic chip across the communication FPGA into the network and back on the receiving side. As the EXTOLL network hardware itself is clocked at a much higher frequency than the FPGAs, the latency is expected to scale in the order of only approximately 75 ns for each additional hop through the network. For being able to globally interpret the arrival timestamps that are transmitted with every spike event, the system time counters on the FPGAs are synchronised across the network. For this, the global interrupt mechanism implemented in the EXTOLL hardware is characterised and used within this thesis. With this, a synchronisation accuracy of ±40ns could be measured. At the end of this thesis, the successful emulation of a neural signal propagation model, distributed across two BrainScaleS-2 chips and FPGAs is demonstrated using the implemented event communication architecture and the described synchronisation mechanism

    A Peer-to-Peer Network Framework Utilising the Public Mobile Telephone Network

    Get PDF
    P2P (Peer-to-Peer) technologies are well established and have now become accepted as a mainstream networking approach. However, the explosion of participating users has not been replicated within the mobile networking domain. Until recently the lack of suitable hardware and wireless network infrastructure to support P2P activities was perceived as contributing to the problem. This has changed with ready availability of handsets having ample processing resources utilising an almost ubiquitous mobile telephone network. Coupled with this has been a proliferation of software applications written for the more capable `smartphone' handsets. P2P systems have not naturally integrated and evolved into the mobile telephone ecosystem in a way that `client-server' operating techniques have. However as the number of clients for a particular mobile application increase, providing the `server side' data storage infrastructure becomes more onerous. P2P systems offer mobile telephone applications a way to circumvent this data storage issue by dispersing it across a network of the participating users handsets. The main goal of this work was to produce a P2P Application Framework that supports developers in creating mobile telephone applications that use distributed storage. Effort was assigned to determining appropriate design requirements for a mobile handset based P2P system. Some of these requirements are related to the limitations of the host hardware, such as power consumption. Others relate to the network upon which the handsets operate, such as connectivity. The thesis reviews current P2P technologies to assess which was viable to form the technology foundations for the framework. The aim was not to re-invent a P2P system design, rather to adopt an existing one for mobile operation. Built upon the foundations of a prototype application, the P2P framework resulting from modifications and enhancements grants access via a simple API (Applications Programmer Interface) to a subset of Nokia `smartphone' devices. Unhindered operation across all mobile telephone networks is possible through a proprietary application implementing NAT (Network Address Translation) traversal techniques. Recognising that handsets operate with limited resources, further optimisation of the P2P framework was also investigated. Energy consumption was a parameter chosen for further examination because of its impact on handset participation time. This work has proven that operating applications in conjunction with a P2P data storage framework, connected via the mobile telephone network, is technically feasible. It also shows that opportunity remains for further research to realise the full potential of this data storage technique
    corecore