6 research outputs found

    Multistage Software Routers in a Virtual Environment

    Get PDF

    Design and implementation of a prototype of the entity Control Element (CE) of the Architecture ForCES

    Full text link
    [EN] This paper presents the designed an implementation of a prototype with the Forwarding and Control Element Separation (ForCES) Architecture. That is to say, which allows each of the elements to be improved separately and then interconnected through the ForCES protocol, even in remote locations. The Control Element (CE) is the logical entity that is part of the control plane and is responsible for managing Forwarding Elements (FE) of the data plane. The ForCES architecture allows you to see these two elements (the CE and the FE through the ForCES protocol) as a single Network Element (NE), even if they are located in remote sites each. To demonstrate this principle, a network testbed scenario was implemented, based on two Local Area Networks (LAN). The LAN 1, for the CEs and the LAN 2 for the FEs, once communicated through the ForCES protocol, the different LFBs configurations of the ARP, SNMP, RIP protocols were used to demonstrate their operation.MartĂ­nez Cordero, S.; Gonzalez Ramirez, PL.; Lloret, J.; Trujillo Arboleda, LC. (2017). Design and implementation of a prototype of the entity Control Element (CE) of the Architecture ForCES. Network Protocols and Algorithms. 9(3-4):1-30. https://doi.org/10.5296/npa.v9i3-4.12433S13093-

    Energy Saving and Virtualization Technologies in Switching

    Get PDF
    Switching is the key functionality for many devices like electronic Router and Switch, optical Router, Network on Chips (NoCs) and so on. Basically, switching is responsible for moving data unit from one port/location to another (or multiple) port(s)/location(s). In past years, the high capacity, low delay were the main concerns when designing high-end switching unit. As new demands, requests and technologies emerge, flexibility and low power cost switching design become to weight the same as throughput and delay. On one hand, highly flexible (i.e, programming ability) switching can cope with variable needs stem from new applications (i.e, VoIP) and popular user behavior (i.e, p2p downloading); on the other hand, reduce the energy and power dissipation for switching could not only save bills and build echo system but also expand components life time. Many research efforts have been devoted to increase switching flexibility and reduce its power cost. In this thesis work, we consider to exploit virtualization as the main technique to build flexible software router in the first part, then in the second part we draw our attention on energy saving in NoC (i.e, a switching fabric designed to handle the on chip data transmission) and software router. In the first part of the thesis, we consider the virtualization inside Software Routers (SRs). SR, i.e, routers running in commodity Personal Computers (PCs), become an appealing solution compared to traditional Proprietary Routing Devices (PRD) for various reasons such as cost (the multi-vendor hardware used by SRs can be cheap, while the equipment needed by PRDs is more expensive and their training cost is higher), openness (SRs can make use of a large number of open source networking applications, while PRDs are more closed) and flexibility. The forwarding performance provided by SRs has been an obstacle to their deployment in real networks. For this reason, we proposed to aggregate multiple routing units that form an powerful SR known as the Multistage Software Router (MSR) to overcome the performance limitation for a single SR. Our results show that the throughput can increase almost linearly as the number of the internal routing devices. But some other features related to flexibility (such as power saving, programmability, router migration or easy management) have been investigated less than performance previously. We noticed that virtualization techniques become reality thanks to the quick development of the PC architectures, which are now able to easily support several logical PCs running in parallel on the same hardware. Virtualization could provide many flexible features like hardware and software decoupling, encapsulation of virtual machine state, failure recovery and security, to name a few. Virtualization permits to build multiple SRs inside one physical host and a multistage architecture exploiting only logical devices. By doing so, physical resources can be used in a more efficient way, energy savings features (switching on and off device when needed) can be introduced and logical resources could be rented on-demand instead of being owned. Since virtualization techniques are still difficult to deploy, several challenges need to be faced when trying to integrate them into routers. The main aim of the first part in this thesis is to find out the feasibility of the virtualization approach, to build and test virtualized SR (VSR), to implement the MSR exploiting logical, i.e. virtualized, resources, to analyze virtualized routing performance and to propose improvement techniques to VSR and virtual MSR (VMSR). More specifically, we considered different virtualization solutions like VMware, XEN, KVM to build VSR and VMSR, being VMware a closed source solution but with higher performance and XEN/KVM open source solutions. Firstly we built and tested each single component of our multistage architecture (i.e, back-end router, load balancer )inside the virtual infrastructure, then and we extended the performance experiments with more complex scenarios like multiple Back-end Router (BR) or Load Balancer (LB) which cooperate to route packets. Our results show that virtualization could introduce 40~\% performance penalty compare with the hardware only solution. Keep the performance limitation in mind, we developed the whole VMSR and we obtained low throughput with 64B packet flow as expected. To increase the VMSR throughput, two directions could be considered, the first one is to improve the single component ( i.e, VSR) performance and the other is to work from the topology (i.e, best allocation of the VMs into the hardware ) point of view. For the first method, we considered to tune the VSR inside the KVM and we studied closely such as Linux driver, scheduler, interconnect methodology which could impact the performance significantly with proper configuration; then we proposed two ways for the VMs allocation into physical servers to enhance the VMSR performance. Our results show that with good tuning and allocation of VMs, we could minimize the virtualization penalty and get reasonable throughput for running SRs inside virtual infrastructure and add flexibility functionalities into SRs easily. In the second part of the thesis, we consider the energy efficient switching design problem and we focus on two main architecture, the NoC and MSR. As many research works suggest, the energy cost in the Communication Technologies ( ICT ) is constantly increasing. Among the main ICT sectors, a large portion of the energy consumption is contributed by the telecommunication infrastructure and their devices, i.e, router, switch, cell phone, ip TV settle box, storage home gateway etc. More in detail, the linecards, links, System on Chip (SoC) including the transmitter/receiver on these variate devices are the main power consuming units. We firstly present the work on the power reduction of the data transmission in SoC, which is carried out by the NoC. NoC is an approach to design the communication subsystem between different Processing Units (PEs) in a SoC. PEs could be different elements such as CPU, memory, digital signal/analog signal processor etc. Different PEs performs specific tasks depending on the applications running on the chip. Different tasks need to exchange data information among each other, thus flits ( chopped packet with limited header information ) are generated by PEs. The flits are injected into the NoC by the proper interface and routed until reach the destination PEs. For the whole procedure, the NoC behaves as a packet switch network. Studies show that in general the information processing in the PEs only consume 60~\% energy while the remaining 40~\% are consumed by the NoC. More importantly, as the current network designing principle, the NoC capacity is devised to handle the peak load. This is a clear sign for energy saving when the network load is low. In our work, we considered to exploit Dynamic Voltage and Frequency Scaling (DVFS) technique, which can jointly decrease or increase the system voltage and frequency when necessary, i.e, decrease the voltage and frequency at low load scenario to save energy and reduce power dissipation. More precisely, we studied two different NoC architectures for energy saving, namely single plane chip and multi-plane chip architecture. In both cases we have a very strict constraint to be that all the links and transmitter/receivers on the same plane work at the same frequency/voltage to avoid synchronization problem. This is the main difference with many existing works in the literature which usually assume different links can work at different frequency, that is hard to be implemented in reality. For the single plane NoC, we exploited different routing schemas combined with DVFS to reduce the power for the whole chip. Our results haven been compared with the optimal value obtained by modeling the power saving formally as a quadratic programming problem. Results suggest that just by using simple load balancing routing algorithm, we can save considerable energy for the single chip NoC architecture. Furthermore, we noticed that in the single plane NoC architecture, the bottleneck link could limit the DVFS effectiveness. Then we discovered that multiplane NoC architecture is fairly easy to be implemented and it could help with the energy saving. Thus we focus on the multiplane architecture and we found out that DVFS could be more efficient when we concentrate more traffic into one plane and send the remaining flows to other planes. We compared load concentration and load balancing with different power modeling and all simulation results show that load concentration is better compared with load balancing for multiplan NoC architecture. Finally, we also present one of the the energy efficient MSR design technique, which permits the MSR to follow the day-night traffic pattern more efficiently with our on-line energy saving algorithm

    Implémentation d'une mémoire cache supportant la recherche IP

    Get PDF
    RÉSUMÉ La croissance explosive du trafic Internet a été accompagnée d’une croissance exponentielle de la bande passante des liens de transmission des équipements de traitement de données, à savoir les routeurs, rendue possible par le déploiement de la fibre optique. Les routeurs sont devenus le principal goulot d’étranglement du traitement des paquets circulant dans le réseau. L’implémentation matérielle permet aux routeurs de satisfaire les exigences de performance d’un routeur à haute vitesse en exploitant l’abondant parallélisme disponible dans le traitement des paquets. Les ASIC (« Application-Specific Integrated Circuits »), qui sont des circuits intégrés spécialisés, sont alors utilisés pour leur performance. Ces circuits induisent cependant des coûts : comme les routeurs sont construits à partir de matériel spécialisé, ce sont des périphériques à fonction fixe qui ne peuvent pas être programmés. Beaucoup d’efforts et de travaux ont été réalisés afin de rendre les ASIC plus programmables, cependant, cela est encore insuffisant pour exprimer de nombreux algorithmes. Dû à la nécessité de mieux contrôler les opérations du réseau et à la demande constante d’offrir de nouvelles fonctionnalités, la contrainte de la programmabilité des routeurs est devenue aussi importante que la performance. Les routeurs logiciels sont considérés plus appropriés dans le contexte où la programmabilité prime sur la performance et ils peuvent bénéficier d’une performance intéressante, comme l’incarnent les processeurs de réseau. Les processeurs réseau utilisent des accélérateurs matériels pour implémenter des fonctions spécifiques comme l’utilisation des mémoires TCAM (« Ternary Content Addressable Memory ») pour effectuer des recherches de types LPM (« Longest Prefix Match »), nécessaires pour la transmission des paquets. La TCAM satisfait le requis de débit exigé par le LPM, qui est le facteur de performance le plus limitant dans la transmission de paquets au vu de sa complexité, et elle s’est imposée comme la solution standard dans l’industrie. Cependant, elle présente des inconvénients graves : sa consommation d’énergie élevée, sa faible flexibilité (opérations de mise à jour lente) et son coût financier (coût par bit plus élevé par rapport aux autres types de mémoire). La consommation d’énergie de la TCAM est critique dans les routeurs, qui ont des budgets de puissance énergétique limités.----------ABSTRACT The Internet’s traffic growth has been accompanied by an exponential growth in the bandwidth of the data processing equipment transmission links, made possible by the deployment of optical fiber. Routers have, therefore, become the main bottleneck in the packets processing speed across the network. The hardware implementation allows routers to meet performance requirements of a high-speed router by exploiting the abundant parallelism available in packet processing. ASICs (Application-Specific Integrated Circuit) are specialized integrated circuits used for their performance, but they have a cost: as routers are built from specialized hardware, they are fixed-function devices that do not cannot be programmed. Much effort and work has been done to make ASICs more programmable, however, this is still insufficient to express many algorithms. Due to the need to better control network operations and the constant demand to support new features, the constraint of router programmability has become as important as performance. Hardware specification is the only way to achieve performance requirements for high-speed routers, however some contexts involve high computing requirements with lower link speeds. Software routers are considered more appropriate in the context where programmability takes precedence over performance and can benefit from interesting performance, as incarnated by network processors. Network processors use hardware accelerators to implement specific functions like TCAM (Ternary Content Addressable Memory) memories to perform LPM (Longest Prefix Match) lookup, required for packet transmission. The TCAM meets the speed required by the LPM, which is the most limiting performance factor in packet transmission due to its complexity, and has been an effective standard in the industry. However, TCAMs have some serious disadvantages: their high-power consumption, their poor scalability and their higher cost per bit compared to other memory types. The motivation of our work is to explore the concept of a generalized cache memory that can support the LPM. The on-chip memory of a network processor acts as a cache memory and is implemented using SRAM technology (Static Random-Access Memory) which is a much less expensive memory than TCAM memory. In order to speed up LPM lookup, the cache memory stores the prefixes consulted recently, in order to reduce the access time to the routing table. In this thesis, an architecture is proposed which relies on associative memories implemented by hash functions
    corecore