70 research outputs found

    Load sharing for multiprocessor network nodes

    Get PDF
    This thesis discusses techniques for sharing the processing load among multiple processing units within systems that act as nodes in a data communications network. Load-sharing techniques have been explored in the field of computer science for many years and their benefits are well known, including better utilization of processing capacity and enhanced system fault tolerance. We discuss deploying such methods in the specifics of the networking environment. We concentrate particularly on the data plane, or the data packet-processing tasks. After reviewing the main results in the fields of load sharing and multiprocessor networking systems architectures, we conduct a preparatory optimization study of a router system to gain better understanding of the optimization issues in a particular multiprocessor system. The main contribution of this thesis, the adaptive load-sharing method, is presented next. We first formulate the optimization problem of mapping packets to processors. The goal is to minimize the likelihood of flow reordering, while respecting certain system constraints, such as the acceptable probability of a packet loss. As we show that the task is an NP-complete problem, we propose a heuristic method that uses an adaptive hash-based mapping to assign packets to processors. We demonstrate its advantages and prove that the method adaptation policy possesses the key minimal disruption property with respect to the mapping. In other words, the adaptation results in a minimum number of flows being moved among processing units. Further on, the method is validated in an extensive set of simulations designed to imitate the networking environment. Finally, two sample applications, an architecture of a multiprotocol router and an implementation of a server load balancer on a network processor demonstrate the applicability of the method

    Classification of networks-on-chip in the context of analysis of promising self-organizing routing algorithms

    Full text link
    This paper contains a detailed analysis of the current state of the network-on-chip (NoC) research field, based on which the authors propose the new NoC classification that is more complete in comparison with previous ones. The state of the domain associated with wireless NoC is investigated, as the transition to these NoCs reduces latency. There is an assumption that routing algorithms from classical network theory may demonstrate high performance. So, in this article, the possibility of the usage of self-organizing algorithms in a wireless NoC is also provided. This approach has a lot of advantages described in the paper. The results of the research can be useful for developers and NoC manufacturers as specific recommendations, algorithms, programs, and models for the organization of the production and technological process.Comment: 10 p., 5 fig. Oral presentation on APSSE 2021 conferenc

    Adaptive Load Balancing Scheme For Data Center Networks Using Software Defined Network

    Get PDF
    A new adaptive load balancing scheme for data center networks is proposed in this paper by utilizing the characteristics of Software Defined Networks. Mininet was utilized for the purpose of emulating and evaluating the proposed design, Miniedit was utilized as a GUI tool. In order to obtain a similar environment to the data center network, Fat-Tree topology was utilized. Different scenarios and traffic distributions were applied in order to cover as much cases of the real traffic as possible. The suggested design showed superiority over the traditional scheme in term of throughput and loss rate for all the evaluated scenarios. Two scenarios were implemented; the proposed algorithm presented a loss-free performance compared to 15% to 31% loss rate in the traditional scheme for the first scenario. The proposed scheme showed up to 81% improvement in the loss rate in the second scenario. In term of throughput, the proposed scheme maintained the same level of throughput in the first scenario compared to an average of 5Mbps reduction in the throughput when using the traditional scheme. While in the second scenario, the new scheme outperformed the traditional scheme by showing an improvement of up to 16.6% in the throughput value

    Energy Saving and Virtualization Technologies in Switching

    Get PDF
    Switching is the key functionality for many devices like electronic Router and Switch, optical Router, Network on Chips (NoCs) and so on. Basically, switching is responsible for moving data unit from one port/location to another (or multiple) port(s)/location(s). In past years, the high capacity, low delay were the main concerns when designing high-end switching unit. As new demands, requests and technologies emerge, flexibility and low power cost switching design become to weight the same as throughput and delay. On one hand, highly flexible (i.e, programming ability) switching can cope with variable needs stem from new applications (i.e, VoIP) and popular user behavior (i.e, p2p downloading); on the other hand, reduce the energy and power dissipation for switching could not only save bills and build echo system but also expand components life time. Many research efforts have been devoted to increase switching flexibility and reduce its power cost. In this thesis work, we consider to exploit virtualization as the main technique to build flexible software router in the first part, then in the second part we draw our attention on energy saving in NoC (i.e, a switching fabric designed to handle the on chip data transmission) and software router. In the first part of the thesis, we consider the virtualization inside Software Routers (SRs). SR, i.e, routers running in commodity Personal Computers (PCs), become an appealing solution compared to traditional Proprietary Routing Devices (PRD) for various reasons such as cost (the multi-vendor hardware used by SRs can be cheap, while the equipment needed by PRDs is more expensive and their training cost is higher), openness (SRs can make use of a large number of open source networking applications, while PRDs are more closed) and flexibility. The forwarding performance provided by SRs has been an obstacle to their deployment in real networks. For this reason, we proposed to aggregate multiple routing units that form an powerful SR known as the Multistage Software Router (MSR) to overcome the performance limitation for a single SR. Our results show that the throughput can increase almost linearly as the number of the internal routing devices. But some other features related to flexibility (such as power saving, programmability, router migration or easy management) have been investigated less than performance previously. We noticed that virtualization techniques become reality thanks to the quick development of the PC architectures, which are now able to easily support several logical PCs running in parallel on the same hardware. Virtualization could provide many flexible features like hardware and software decoupling, encapsulation of virtual machine state, failure recovery and security, to name a few. Virtualization permits to build multiple SRs inside one physical host and a multistage architecture exploiting only logical devices. By doing so, physical resources can be used in a more efficient way, energy savings features (switching on and off device when needed) can be introduced and logical resources could be rented on-demand instead of being owned. Since virtualization techniques are still difficult to deploy, several challenges need to be faced when trying to integrate them into routers. The main aim of the first part in this thesis is to find out the feasibility of the virtualization approach, to build and test virtualized SR (VSR), to implement the MSR exploiting logical, i.e. virtualized, resources, to analyze virtualized routing performance and to propose improvement techniques to VSR and virtual MSR (VMSR). More specifically, we considered different virtualization solutions like VMware, XEN, KVM to build VSR and VMSR, being VMware a closed source solution but with higher performance and XEN/KVM open source solutions. Firstly we built and tested each single component of our multistage architecture (i.e, back-end router, load balancer )inside the virtual infrastructure, then and we extended the performance experiments with more complex scenarios like multiple Back-end Router (BR) or Load Balancer (LB) which cooperate to route packets. Our results show that virtualization could introduce 40~\% performance penalty compare with the hardware only solution. Keep the performance limitation in mind, we developed the whole VMSR and we obtained low throughput with 64B packet flow as expected. To increase the VMSR throughput, two directions could be considered, the first one is to improve the single component ( i.e, VSR) performance and the other is to work from the topology (i.e, best allocation of the VMs into the hardware ) point of view. For the first method, we considered to tune the VSR inside the KVM and we studied closely such as Linux driver, scheduler, interconnect methodology which could impact the performance significantly with proper configuration; then we proposed two ways for the VMs allocation into physical servers to enhance the VMSR performance. Our results show that with good tuning and allocation of VMs, we could minimize the virtualization penalty and get reasonable throughput for running SRs inside virtual infrastructure and add flexibility functionalities into SRs easily. In the second part of the thesis, we consider the energy efficient switching design problem and we focus on two main architecture, the NoC and MSR. As many research works suggest, the energy cost in the Communication Technologies ( ICT ) is constantly increasing. Among the main ICT sectors, a large portion of the energy consumption is contributed by the telecommunication infrastructure and their devices, i.e, router, switch, cell phone, ip TV settle box, storage home gateway etc. More in detail, the linecards, links, System on Chip (SoC) including the transmitter/receiver on these variate devices are the main power consuming units. We firstly present the work on the power reduction of the data transmission in SoC, which is carried out by the NoC. NoC is an approach to design the communication subsystem between different Processing Units (PEs) in a SoC. PEs could be different elements such as CPU, memory, digital signal/analog signal processor etc. Different PEs performs specific tasks depending on the applications running on the chip. Different tasks need to exchange data information among each other, thus flits ( chopped packet with limited header information ) are generated by PEs. The flits are injected into the NoC by the proper interface and routed until reach the destination PEs. For the whole procedure, the NoC behaves as a packet switch network. Studies show that in general the information processing in the PEs only consume 60~\% energy while the remaining 40~\% are consumed by the NoC. More importantly, as the current network designing principle, the NoC capacity is devised to handle the peak load. This is a clear sign for energy saving when the network load is low. In our work, we considered to exploit Dynamic Voltage and Frequency Scaling (DVFS) technique, which can jointly decrease or increase the system voltage and frequency when necessary, i.e, decrease the voltage and frequency at low load scenario to save energy and reduce power dissipation. More precisely, we studied two different NoC architectures for energy saving, namely single plane chip and multi-plane chip architecture. In both cases we have a very strict constraint to be that all the links and transmitter/receivers on the same plane work at the same frequency/voltage to avoid synchronization problem. This is the main difference with many existing works in the literature which usually assume different links can work at different frequency, that is hard to be implemented in reality. For the single plane NoC, we exploited different routing schemas combined with DVFS to reduce the power for the whole chip. Our results haven been compared with the optimal value obtained by modeling the power saving formally as a quadratic programming problem. Results suggest that just by using simple load balancing routing algorithm, we can save considerable energy for the single chip NoC architecture. Furthermore, we noticed that in the single plane NoC architecture, the bottleneck link could limit the DVFS effectiveness. Then we discovered that multiplane NoC architecture is fairly easy to be implemented and it could help with the energy saving. Thus we focus on the multiplane architecture and we found out that DVFS could be more efficient when we concentrate more traffic into one plane and send the remaining flows to other planes. We compared load concentration and load balancing with different power modeling and all simulation results show that load concentration is better compared with load balancing for multiplan NoC architecture. Finally, we also present one of the the energy efficient MSR design technique, which permits the MSR to follow the day-night traffic pattern more efficiently with our on-line energy saving algorithm

    Load balancing for parallel forwarding

    Full text link

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    Accuracy and Dynamics of Hash-Based Load Balancing Algorithms for Multipath Internet Routing

    Get PDF
    This paper studies load balancing for multipath Internet routing. We focus on hash-based load balancing algorithms that work on the flow level to avoid packet reordering which is detrimental for the throughput of transport layer protocols like TCP. We propose a classification of hash-based load balancing algorithms, review existing ones and suggest new ones. Dynamic algorithms can actively react to load imbalances which causes route changes for some flows and thereby again packet reordering. Therefore, we investigate the load balancing accuracy and flow reassignment rate of load balancing algorithms. Our exhaustive simulation experiments show that these performance measures depend significantly on the traffic properties and on the algorithms themselves. As a consequence, our results should be taken into account for the application of load balancing in practice

    Adaptive Load Sharing for Network Processors

    Get PDF
    A novel scheme for processing packets in a router is presented that provides load sharing among multiple network processors distributed within the router. It is complemented by a feedback control mechanism designed to prevent processor overload. Incoming traffic is scheduled to multiple processors based on a deterministic mapping. The mapping formula is derived from the robust hash routing (also known as the highest random weight - HRW) scheme, introduced in K.W.\ Ross, IEEE Network, 11(6), 1997, and D.G.\ Thaler et al., IEEE Trans.\ Networking, 6 (1), 1998. \emph{No state information} on individual flow mapping has to be stored, but for each packet, a mapping function is computed over an \emph{identifier vector}, a predefined set of fields in the packet. An \emph{adaptive extension} to the HRW scheme is provided to cope with biased traffic patterns. We prove that our adaptation possesses the \emph {minimal disruption property} with respect to the mapping and exploit that property to minimize the probability of flow reordering. Simulation results indicate that the scheme achieves significant improvements in processor utilization. A higher number of router interfaces can thus be supported with the same amount of processing power

    Network Processors and Next Generation Networks: Design, Applications, and Perspectives

    Get PDF
    Network Processors (NPs) are hardware platforms born as appealing solutions for packet processing devices in networking applications. Nowadays, a plethora of solutions exists, with no agreement on a common architecture. Each vendor has proposed its specific solution and no official standard still exists. The common features of all proposals are a hierarchy of processors, with a general purpose processor and several units specialized for packet processing, a series of memory devices with different sizes and latencies, a low-level programmability. The target is a platform for networking applications with low time to market and high time in market, thanks to a high flexibility and a programmability simpler than that of ASICs, for example. After about ten years since the "birth" of network processors, this research activity wants to make an analytical balance of their development and usage. Many authoritative opinions suggest that NPs have been "outdated" by multicore or manycore systems, which provide general purpose environments and some specialized cores. The main reasons of these negative opinions are the hard programmability of NPs, which often requires the knowledge of private microcode, or the excessive architectural limits, such as reduced memories and minimal instruction store. Our research shows that Network Processors can be appealing for different applications in networking area, and many interesting solutions can be obtained, which present very high performance, outscoring current solutions. However, the issues of hard programming and remarkable limits exist, and they could be alleviated only by providing almost a comprehensive programming environment and a proper design in terms of processing and memory resources. More e cient solutions can be surely provided, but the experience of network processors has produced an important legacy in developing packet processing engines. In this work, we have realized many devices for networking purposes based on NP platform, in order to understand the complexity of programming, the flexibility of design, the complexity of tasks that can be implemented, the maximum depth of packet processing, the performance of such devices, the real usefulness of NPs in network devices. All these features have been accurately analyzed and will be illustrated in this thesis. Many remarkable results have been obtained, which confirm the Network Processors as appealing solutions for network devices. Moreover, the research on NPs have lead us to analyze and solve more general issues, related for instance to multiprocessor systems or to processors with no big available memory. In particular, the latter issue lead us to design many interesting data structures for set representation and membership query, which are based on randomized techniques and allow for big memory savings

    An extensible design of a load-aware virtual router monitor in user space.

    Get PDF
    Choi, Fu Wing.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (p. 54-57).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.2Chapter 2 --- Overview --- p.5Chapter 2.1 --- Summary of our Router Virtualization Architecture --- p.6Chapter 3 --- LVRM Design --- p.9Chapter 3.1 --- Socket Adapter --- p.9Chapter 3.2 --- VR Monitor --- p.11Chapter 3.3 --- VRI Monitor --- p.14Chapter 3.4 --- VRI Adapter --- p.16Chapter 3.5 --- Inter-Process Communication (IPC) Queue --- p.17Chapter 3.6 --- LVRM Adapter for VRI --- p.17Chapter 3.7 --- VRI --- p.18Chapter 3.8 --- Interfacing Between LVRM and VRs --- p.18Chapter 4 --- Experiments --- p.20Chapter 4.1 --- Experimental Setup --- p.20Chapter 4.2 --- Performance Overhead of LVRM --- p.23Chapter 4.3 --- Core Allocation --- p.31Chapter 4.4 --- Load Balancing --- p.38Chapter 4.5 --- Scalability --- p.43Chapter 4.6 --- Lessons Learned --- p.47Chapter 5 --- Related Work --- p.50Chapter 6 --- Conclusions --- p.5
    • …
    corecore