17 research outputs found

    MorphIC: A 65-nm 738k-Synapse/mm2^2 Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning

    Full text link
    Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm2^2 in 65nm CMOS, achieving a high density of 738k synapses/mm2^2. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE Transactions on Biomedical Circuits and Systems journal (2019), the fully-edited paper is available at https://ieeexplore.ieee.org/document/876400

    Algorithmes efficaces de gestion des règles dans les réseaux définis par logiciel

    Get PDF
    In software-defined networks (SDN), the filtering requirements for critical applications often vary according to flow changes and security policies. SDN addresses this issue with a flexible software abstraction, allowing simultaneous and convenient modification and implementation of a network policy on flow-based switches.With the increase in the number of entries in the ruleset and the size of data that traverses the network each second, it remains crucial to minimize the number of entries and accelerate the lookup process. On the other hand, attacks on Internet have reached a high level. The number keeps increasing, which increases the size of blacklists and the number of rules in firewalls. The limited storage capacity requires efficient management of that space. In the first part of this thesis, our primary goal is to find a simple representation of filtering rules that enables more compact rule tables and thus is easier to manage while keeping their semantics unchanged. The construction of rules should be obtained with reasonably efficient algorithms too. This new representation can add flexibility and efficiency in deploying security policies since the generated rules are easier to manage. A complementary approach to rule compression would be to use multiple smaller switch tables to enforce access-control policies in the network. However, most of them have a significant rules replication, or even they modify the packet's header to avoid matching a rule by a packet in the next switch. The second part of this thesis introduces new techniques to decompose and distribute filtering rule sets over a given network topology. We also introduce an update strategy to handle the changes in network policy and topology. In addition, we also exploit the structure of a series-parallel graph to efficiently resolve the rule placement problem for all-sized networks intractable time.Au sein des réseaux définis par logiciel (SDN), les exigences de filtrage pour les applications critiques varient souvent en fonction des changements de flux et des politiques de sécurité. SDN résout ce problème avec une abstraction logicielle flexible, permettant la modification et la mise en \oe{}uvre simultanées et pratiques d'une politique réseau sur les routeurs.Avec l'augmentation du nombre de règles de filtrage et la taille des données qui traversent le réseau chaque seconde, il est crucial de minimiser le nombre d'entrées et d'accélérer le processus de recherche. D'autre part, l'accroissement du nombre d'attaques sur Internet s'accompagne d'une augmentation de la taille des listes noires et du nombre de règles des pare-feux. Leur capacité de stockage limitée nécessite une gestion efficace de l'espace. Dans la première partie de cette thèse, nous proposons une représentation compacte des règles de filtrage tout en préservant leur sémantique. La construction de cette représentation est obtenue par des algorithmes raisonnablement efficaces. Cette technique permet flexibilité et efficacité dans le déploiement des politiques de sécurité puisque les règles engendrées sont plus faciles à gérer.Des approches complémentaires à la compression de règles consistent à décomposer et répartir les tables de règles, pour implémenter, par exemple, des politiques de contrôle d'accès distribué.Cependant, la plupart d'entre elles nécessitent une réplication importante de règles, voire la modification des en-têtes de paquets. La deuxième partie de cette thèse présente de nouvelles techniques pour décomposer et distribuer des ensembles de règles de filtrage sur une topologie de réseau donnée. Nous introduisons également une stratégie de mise à jour pour gérer les changements de politique et de topologie du réseau. De plus, nous exploitons également la structure de graphe série-parallèle pour résoudre efficacement le problème de placement de règles

    Models, Algorithms, and Architectures for Scalable Packet Classification

    Get PDF
    The growth and diversification of the Internet imposes increasing demands on the performance and functionality of network infrastructure. Routers, the devices responsible for the switch-ing and directing of traffic in the Internet, are being called upon to not only handle increased volumes of traffic at higher speeds, but also impose tighter security policies and provide support for a richer set of network services. This dissertation addresses the searching tasks performed by Internet routers in order to forward packets and apply network services to packets belonging to defined traffic flows. As these searching tasks must be performed for each packet traversing the router, the speed and scalability of the solutions to the route lookup and packet classification problems largely determine the realizable performance of the router, and hence the Internet as a whole. Despite the energetic attention of the academic and corporate research communities, there remains a need for search engines that scale to support faster communication links, larger route tables and filter sets and increasingly complex filters. The major contributions of this work include the design and analysis of a scalable hardware implementation of a Longest Prefix Matching (LPM) search engine for route lookup, a survey and taxonomy of packet classification techniques, a thorough analysis of packet classification filter sets, the design and analysis of a suite of performance evaluation tools for packet classification algorithms and devices, and a new packet classification algorithm that scales to support high-speed links and large filter sets classifying on additional packet fields

    Conception et évaluation des systèmes logiciels de classifications de paquets haute-performance

    Get PDF
    Packet classification consists of matching packet headers against a set of pre-defined rules, and performing the action(s) associated with the matched rule(s). As a key technology in the data-plane of network devices, packet classification has been widely deployed in many network applications and services, such as firewalling, load balancing, VPNs etc. Packet classification has been extensively studied in the past two decades. Traditional packet classification methods are usually based on specific hardware. With the development of data center networking, software-defined networking, and application-aware networking technology, packet classification methods based on multi/many processor platform are becoming a new research interest. In this dissertation, packet classification has been studied mainly in three aspects: algorithm design framework, rule-set features analysis and algorithm implementation and optimization. In the dissertation, we review multiple proposed algorithms and present a decision tree based algorithm design framework. The framework decomposes various existing packet classification algorithms into a combination of different types of “meta-methods”, revealing the connection between different algorithms. Based on this framework, we combine different “meta-methods” from different algorithms, and propose two new algorithms, HyperSplit-op and HiCuts-op. The experiment results show that HiCuts-op achieves 2~20x less memory size, and 10% less memory accesses than HiCuts, while HyperSplit-op achieves 2~200x less memory size, and 10%~30% less memory accesses than HyperSplit. We also explore the connections between the rule-set features and the performance of various algorithms. We find that the “coverage uniformity” of the rule-set has a significant impact on the classification speed, and the size of “orthogonal structure” rules usually determines the memory size of algorithms. Based on these two observations, we propose a memory consumption model and a quantified method for coverage uniformity. Using the two tools, we propose a new multi-decision tree algorithm, SmartSplit and an algorithm policy framework, AutoPC. Compared to EffiCuts algorithm, SmartSplit achieves around 2.9x speedup and up to 10x memory size reduction. For a given rule-set, AutoPC can automatically recommend a “right” algorithm for the rule-set. Compared to using a single algorithm on all the rulesets, AutoPC achieves in average 3.8 times faster. We also analyze the connection between prefix length and the update overhead for IP lookup algorithms. We observe that long prefixes will always result in more memory accesses using Tree Bitmap algorithm while short prefixes will always result in large update overhead in DIR-24-8. Through combining two algorithms, a hybrid algorithm, SplitLookup, is proposed to reduce the update overhead. Experimental results show that, the hybrid algorithm achieves 2 orders of magnitudes less in memory accesses when performing short prefixes updating, but its lookup speed with DIR-24-8 is close. In the dissertation, we implement and optimize multiple algorithms on the multi/many core platform. For IP lookup, we implement two typical algorithms: DIR-24-8 and Tree Bitmap, and present several optimization tricks for these two algorithms. For multi-dimensional packet classification, we have implemented HyperCuts/HiCuts and the variants of these two algorithms, such as Adaptive Binary Cuttings, EffiCuts, HiCuts-op and HyperSplit-op. The SplitLookup algorithm has achieved up to 40Gbps throughput on TILEPro64 many-core processor. The HiCuts-op and HyperSplit-op have achieved up to 10 to 20Gbps throughput on a single core of Intel processors. In general, our study reveals the connections between the algorithmic tricks and rule-set features. Results in this dissertation provide insight for new algorithm design and the guidelines for efficient algorithm implementation.La classification de paquets consiste à vérifier par rapport à un ensemble de règles prédéfinies le contenu des entêtes de paquets. Cette vérification permet d'appliquer à chaque paquet l'action adaptée en fonction de règles qu'il valide. La classification de paquets étant un élément clé du plan de données des équipements de traitements de paquets, elle est largement utilisée dans de nombreuses applications et services réseaux, comme les pare-feu, l'équilibrage de charge, les réseaux privés virtuels, etc. Au vu de son importance, la classification de paquet a été intensivement étudiée durant les vingt dernières années. La solution classique à ce problème a été l'utilisation de matériel dédiés et conçus pour cet usage. Néanmoins, l'émergence des centres de données, des réseaux définis en logiciel nécessite une flexibilité et un passage à l'échelle que les applications classiques ne nécessitaient pas. Afin de relever ces défis des plateformes de traitement multi-cœurs sont de plus en plus utilisés. Cette thèse étudie la classification de paquets suivant trois dimensions : la conception des algorithmes, les propriétés des règles de classification et la mise en place logicielle, matérielle et son optimisation. La thèse commence, par faire une rétrospective sur les diverses algorithmes fondés sur des arbres de décision développés pour résoudre le problème de classification de paquets. Nous proposons un cadre générique permettant de classifier ces différentes approches et de les décomposer en une séquence de « méta-méthodes ». Ce cadre nous a permis de monter la relation profonde qui existe ces différentes méthodes et en combinant de façon différentes celle-ci de construire deux nouveaux algorithmes de classification : HyperSplit-op et HiCuts-op. Nous montrons que ces deux algorithmes atteignent des gains de 2~200x en terme de taille de mémoire et 10%~30% moins d'accès mémoire que les meilleurs algorithmes existant. Ce cadre générique est obtenu grâce à l'analyse de la structure des ensembles de règles utilisés pour la classification des paquets. Cette analyse a permis de constater qu'une « couverture uniforme » dans l'ensemble de règle avait un impact significatif sur la vitesse de classification ainsi que l'existence de « structures orthogonales » avait un impact important sur la taille de la mémoire. Cette analyse nous a ainsi permis de développer un modèle de consommation mémoire qui permet de découper les ensembles de règles afin d'en construire les arbres de décision. Ce découpage permet jusqu'à un facteur de 2.9 d'augmentation de la vitesse de classification avec une réduction jusqu'à 10x de la mémoire occupé. La classification par ensemble de règle simple n'est pas le seul cas de classification de paquets. La recherche d'adresse IP par préfixe le plus long fourni un autre traitement de paquet stratégique à mettre en œuvre. Une troisième partie de cette thèse c'est donc intéressé à ce problème et plus particulièrement sur l'interaction entre la charge de mise à jour et la vitesse de classification. Nous avons observé que la mise à jour des préfixes longs demande plus d'accès mémoire que celle des préfixes court dans les structures de données d'arbre de champs de bits alors que l'inverse est vrai dans la structure de données DIR-24-8. En combinant ces deux approches, nous avons propose un algorithme hybride SplitLookup, qui nécessite deux ordres de grandeurs moins d'accès mémoire quand il met à jour les préfixes courts tout en gardant des performances de recherche de préfixe proche du DIR-24-8. Tous les algorithmes étudiés, conçus et implémentés dans cette thèse ont été optimisés à partir de nouvelles structures de données pour s'exécuter sur des plateformes multi-cœurs. Ainsi nous obtenons des débits de recherche de préfixe atteignant 40 Gbps sur une plateforme TILEPro64

    Fast Packet Processing on High Performance Architectures

    Get PDF
    The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors

    Addressing TCAM limitations in an SDN-based pub/sub system

    Get PDF
    Content-based publish/subscribe is a popular paradigm that enables asynchronous exchange of events between decoupled applications that is practiced in a wide range of domains. Hence, extensive research has been conducted in the area of efficient large-scale pub/sub system. A more recent development are content-based pub/sub systems that utilize software-defined networking (SDN) in order to implement event-filtering in the network layer. By installing content-filters in the ternary content-addressable memory (TCAM) of switches, these systems are able to achieve event filtering and forwarding at line-rate performance. While offering great performance, TCAM is also expensive, power hunger and limited in size. However, current SDN-based pub/sub systems don't address these limitations, thus using TCAM excessively. Therefore, this thesis provides techniques for constraining TCAM usage in such systems. The proposed methods enforce concrete flow limits without dropping any events by selectively merging content-filters into more coarse granular filters. The proposed algorithms leverage information about filter properties, traffic statistics, event distribution and global filter state in order to minimize the increase of unnecessary traffic introduced through merges. The proposed approach is twofold. A local enforcement algorithm ensures that the flow limit of a particular switch is never violated. This local approach is complemented by a periodically executed global optimization algorithm that tries to find a flow configuration on all switches, which minimized to increase in unnecessary traffic, given the current set of advertisements and subscriptions. For both classes, two algorithms with different properties are outlined. The proposed algorithms are integrated into the PLEROMA middleware and evaluated thoroughly in a real SDN testbed as well as in a large-scale network emulation. The evaluations demonstrate the effectiveness of the approaches under diverse and realistic workloads. In some cases, reducing the number of flows by more than 70% while increasing the false positive rate by less than 1% is possible

    Scalability and Resilience Analysis of Software-Defined Networking

    Get PDF
    Software-defined Networking (SDN) ist eine moderne Architektur für Kommunikationsnetze, welche entwickelt wurde, um die Einführung von neuen Diensten und Funktionen in Netzwerke zu erleichtern. Durch eine Trennung der Weiterleitungs- und Kontrollfunktionen sind nur wenige Kontrollelemente mit Software-Updates zu versehen, um Veränderungen am Netz vornehmen zu können. Allerdings wirft die Netzstrukturierung von SDN neue Fragen bezüglich Skalierbarkeit und Ausfallsicherheit auf, welche in dezentralen Netzstrukturen nicht auftreten. In dieser Arbeit befassen wir uns mit Fragestellungen zu Skalierbarkeit und Ausfallsicherheit in Bezug auf Unicast- und Multicast-Verkehr in SDN-basierten Netzen. Wir führen eine Komprimierungstechnik für Routingtabellen ein, welche die Skalierungsproblematik aktueller SDN Weiterleitungsgeräte verbessern soll und ermitteln ihre Effizienz in einer Leistungsbewertung. Außerdem diskutieren wir unterschiedliche Methoden, um die Ausfallsicherheit in SDN zu verbessern. Wir analysieren sie auf öffentlich zugänglichen Netzwerken und benennen Vor- und Nachteile der Ansätze. Abschließend schlagen wir eine skalierbare und ausfallsichere Architektur für Multicast-basiertes SDN vor. Wir untersuchen ihre Effizienz in einer Leistungsbewertung und zeigen ihre Umsetzbarkeit mithilfe eines Prototypen.Software-Defined Networking (SDN) is a novel architecture for communication networks that has been developed to ease the introduction of new network services and functions. It leverages the separation of the data plane and the control plane to allow network services to be deployed solely in software. Although SDN provides great flexibility, the applicability of SDN in communication networks raises several questions with regard to scalability and resilience against network failures. These concerns are not prevalent in current decentralized network architectures. In this thesis, we address scalability and resilience issues with regard to unicast and multicast traffic for SDN-based networks. We propose a new compression method for inter-domain routing tables to address hardware limitations of current SDN switches and analyze its effectiveness. We propose various resilience methods for SDN and identify their key performance indicators in the context of carrier-grade and datacenter networks. We discuss the advantages and disadvantages of these proposals and their appropriate use cases. Finally, we propose a scalable and resilient software-defined multicast architecture. We study the effectiveness of our approach and show its feasibility using a prototype implementation

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects
    corecore