689 research outputs found

    Electronic and photonic switching in the atm era

    Get PDF
    Broadband networks require high-capacity switches in order to properly manage large amounts of traffic fluxes. Electronic and photonic technologies are being used to achieve this objective both allowing different multiplexing and switching techniques. Focusing on the asynchronous transfer mode (ATM), the inherent different characteristics of electronics and photonics makes different architectures feasible. In this paper, different switching structures are described, several ATM switching architectures which have been recently implemented are presented and the implementation characteristics discussed. Three diverse points of view are given from the electronic research, the photonic research and the commercial switches. Although all the architectures where successfully tested, they should also follow different market requirements in order to be commercialised. The characteristics are presented and the architectures projected over them to evaluate their commercial capabilities.Peer ReviewedPostprint (published version

    Optical architectures for high performance switching and routing

    Get PDF
    This thesis investigates optical interconnection networks for high performance switching and routing. Two main topics are studied. The first topic regards the use of silicon microring resonators for short reach optical interconnects. Photonic technologies can help to overcome the intrinsic limitations of electronics when used in interconnects, short-distance transmissions and switching operations. This thesis considers the peculiarasymmetric losses of microring resonators since they pose unprecedented challenges for the design of the architecture and for the routing algorithms. It presents new interconnection architectures, proposes modifications on classical routing algorithms and achieves a better performance in terms of fabric complexity and scalability with respect to the state of the art. Subsequently, this thesis considers wavelength dimension capabilities of microring resonators in which wavelength reuse (i.e. crosstalk accumulation) presents impairments on the system performance. To this aim, it presents different crosstalk reduction techniques, a feasibility analysis for the design of microring resonators and a novel wavelength-agile routing matrix. The second topic regards flexible resource allocation with adaptable infrastructure for elastic optical networks. In particular, it focus on Architecture on Demand (AoD), whereby optical node architectures can be reconfigured on the fly according to traffic requirements. This thesis includes results on the first flexible-grid optical spectrum networking field trial, carried out in a collaboration with University of Essex. Finally, it addresses several challenges that present the novel concept AoD by means of modeling and simulation. This thesis proposes an algorithm to perform automatic architecture synthesis, reports AoD scalability and power consumption results working under the proposed synthesis algorithm. Such results validate AoD as a flexible node concept that provides power efficiency and high switching capacity

    Directory Based Cache Coherency Protocols for Shared Memory Multiprocessors

    Get PDF
    Directory based cache coherency protocols can be used to build large scale, weakly ordered, shared memory multiprocessors. The salient feature of these protocols is that they are interconnection network independent, making them more scaleable than snoopy bus protocols. The major criticisms of previously defined directory protocols point to the size of memory heeded to store the directory and the amount of communication across the interconnection network required to maintain coherence. This thesis tries solving these problems by changing the entry format of the global table, altering the architecture of the global table, and developing new protocols. Some alternative directory entry formats are described, including a special entry format for implementing queueing semaphores. Evaluation of the various entry formats is done with probabilistic models of shared cache blocks and software simulation. A variable length global table organization is presented which can be used to reduce the size of the global table, regardless of the entry format. Its performance is analyzed using software simulation. A protocol which maintains a linked list of processors which have a particular block cached is presented. Several variations of this protocol induce less interconnection network traffic than traditional protocols

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    Energy Saving and Virtualization Technologies in Switching

    Get PDF
    Switching is the key functionality for many devices like electronic Router and Switch, optical Router, Network on Chips (NoCs) and so on. Basically, switching is responsible for moving data unit from one port/location to another (or multiple) port(s)/location(s). In past years, the high capacity, low delay were the main concerns when designing high-end switching unit. As new demands, requests and technologies emerge, flexibility and low power cost switching design become to weight the same as throughput and delay. On one hand, highly flexible (i.e, programming ability) switching can cope with variable needs stem from new applications (i.e, VoIP) and popular user behavior (i.e, p2p downloading); on the other hand, reduce the energy and power dissipation for switching could not only save bills and build echo system but also expand components life time. Many research efforts have been devoted to increase switching flexibility and reduce its power cost. In this thesis work, we consider to exploit virtualization as the main technique to build flexible software router in the first part, then in the second part we draw our attention on energy saving in NoC (i.e, a switching fabric designed to handle the on chip data transmission) and software router. In the first part of the thesis, we consider the virtualization inside Software Routers (SRs). SR, i.e, routers running in commodity Personal Computers (PCs), become an appealing solution compared to traditional Proprietary Routing Devices (PRD) for various reasons such as cost (the multi-vendor hardware used by SRs can be cheap, while the equipment needed by PRDs is more expensive and their training cost is higher), openness (SRs can make use of a large number of open source networking applications, while PRDs are more closed) and flexibility. The forwarding performance provided by SRs has been an obstacle to their deployment in real networks. For this reason, we proposed to aggregate multiple routing units that form an powerful SR known as the Multistage Software Router (MSR) to overcome the performance limitation for a single SR. Our results show that the throughput can increase almost linearly as the number of the internal routing devices. But some other features related to flexibility (such as power saving, programmability, router migration or easy management) have been investigated less than performance previously. We noticed that virtualization techniques become reality thanks to the quick development of the PC architectures, which are now able to easily support several logical PCs running in parallel on the same hardware. Virtualization could provide many flexible features like hardware and software decoupling, encapsulation of virtual machine state, failure recovery and security, to name a few. Virtualization permits to build multiple SRs inside one physical host and a multistage architecture exploiting only logical devices. By doing so, physical resources can be used in a more efficient way, energy savings features (switching on and off device when needed) can be introduced and logical resources could be rented on-demand instead of being owned. Since virtualization techniques are still difficult to deploy, several challenges need to be faced when trying to integrate them into routers. The main aim of the first part in this thesis is to find out the feasibility of the virtualization approach, to build and test virtualized SR (VSR), to implement the MSR exploiting logical, i.e. virtualized, resources, to analyze virtualized routing performance and to propose improvement techniques to VSR and virtual MSR (VMSR). More specifically, we considered different virtualization solutions like VMware, XEN, KVM to build VSR and VMSR, being VMware a closed source solution but with higher performance and XEN/KVM open source solutions. Firstly we built and tested each single component of our multistage architecture (i.e, back-end router, load balancer )inside the virtual infrastructure, then and we extended the performance experiments with more complex scenarios like multiple Back-end Router (BR) or Load Balancer (LB) which cooperate to route packets. Our results show that virtualization could introduce 40~\% performance penalty compare with the hardware only solution. Keep the performance limitation in mind, we developed the whole VMSR and we obtained low throughput with 64B packet flow as expected. To increase the VMSR throughput, two directions could be considered, the first one is to improve the single component ( i.e, VSR) performance and the other is to work from the topology (i.e, best allocation of the VMs into the hardware ) point of view. For the first method, we considered to tune the VSR inside the KVM and we studied closely such as Linux driver, scheduler, interconnect methodology which could impact the performance significantly with proper configuration; then we proposed two ways for the VMs allocation into physical servers to enhance the VMSR performance. Our results show that with good tuning and allocation of VMs, we could minimize the virtualization penalty and get reasonable throughput for running SRs inside virtual infrastructure and add flexibility functionalities into SRs easily. In the second part of the thesis, we consider the energy efficient switching design problem and we focus on two main architecture, the NoC and MSR. As many research works suggest, the energy cost in the Communication Technologies ( ICT ) is constantly increasing. Among the main ICT sectors, a large portion of the energy consumption is contributed by the telecommunication infrastructure and their devices, i.e, router, switch, cell phone, ip TV settle box, storage home gateway etc. More in detail, the linecards, links, System on Chip (SoC) including the transmitter/receiver on these variate devices are the main power consuming units. We firstly present the work on the power reduction of the data transmission in SoC, which is carried out by the NoC. NoC is an approach to design the communication subsystem between different Processing Units (PEs) in a SoC. PEs could be different elements such as CPU, memory, digital signal/analog signal processor etc. Different PEs performs specific tasks depending on the applications running on the chip. Different tasks need to exchange data information among each other, thus flits ( chopped packet with limited header information ) are generated by PEs. The flits are injected into the NoC by the proper interface and routed until reach the destination PEs. For the whole procedure, the NoC behaves as a packet switch network. Studies show that in general the information processing in the PEs only consume 60~\% energy while the remaining 40~\% are consumed by the NoC. More importantly, as the current network designing principle, the NoC capacity is devised to handle the peak load. This is a clear sign for energy saving when the network load is low. In our work, we considered to exploit Dynamic Voltage and Frequency Scaling (DVFS) technique, which can jointly decrease or increase the system voltage and frequency when necessary, i.e, decrease the voltage and frequency at low load scenario to save energy and reduce power dissipation. More precisely, we studied two different NoC architectures for energy saving, namely single plane chip and multi-plane chip architecture. In both cases we have a very strict constraint to be that all the links and transmitter/receivers on the same plane work at the same frequency/voltage to avoid synchronization problem. This is the main difference with many existing works in the literature which usually assume different links can work at different frequency, that is hard to be implemented in reality. For the single plane NoC, we exploited different routing schemas combined with DVFS to reduce the power for the whole chip. Our results haven been compared with the optimal value obtained by modeling the power saving formally as a quadratic programming problem. Results suggest that just by using simple load balancing routing algorithm, we can save considerable energy for the single chip NoC architecture. Furthermore, we noticed that in the single plane NoC architecture, the bottleneck link could limit the DVFS effectiveness. Then we discovered that multiplane NoC architecture is fairly easy to be implemented and it could help with the energy saving. Thus we focus on the multiplane architecture and we found out that DVFS could be more efficient when we concentrate more traffic into one plane and send the remaining flows to other planes. We compared load concentration and load balancing with different power modeling and all simulation results show that load concentration is better compared with load balancing for multiplan NoC architecture. Finally, we also present one of the the energy efficient MSR design technique, which permits the MSR to follow the day-night traffic pattern more efficiently with our on-line energy saving algorithm

    High Performance and Power Efficient On-Chip Network Designs through Multiple Injection Ports

    Full text link
    Las redes dentro de un chip se están convirtiendo en el elemento principal de los sistemas multiprocesador. A medida que aumenta la escala de integración, más elementos de cómputo (procesadores) se incluyen en el mismo chip. Estos componentes se interconectan con una red dentro del chip que debe ofrecer latencias de transmisión ultra bajas (orden de nanosegundos) y anchos de banda elevados. El diseño, pues, de una red eficiente dentro del chip juega un papel fundamental. En la presente tesis se analizan diferentes alternativas de diseño de las redes en el chip. En particular, se hace uso de la posibilidad de utilizar diferentes puertos de inyección desde los procesadores con el fin de obtener diferentes mejoras. En primer lugar, las prestaciones aumentan al tener procesadores con distintas alternativas de inyección de tráfico. En segundo lugar, además aumenta la tolerancia a fallos frente a defectos de fabricación (mas importantes conforme avanza la tecnología). Y en tercer lugar, permite una política de apagado de componentes más agresiva que nos permita un ahorro significativo de energía. Hemos evaluado diferentes topologías derivadas del mecanismo de inyección en términos de prestaciones, coste de implementación, y ahorro de consumo. Además, hemos desarrollado simuladores específicos para las distintas técnicas utilizadas. Cada topología diseñada supone una mejora respecto a la anterior, y por supuesto, teniendo en cuenta las topologías existentes. En resumen, nuestro esfuerzo se centra en conseguir un excelente compromiso entre prestaciones, consumo y tolerancia a fallos dentro de una red en chip. Para la primera propuesta (topología NR-Mesh), se alcanzan mejoras en prestaciones de un 7\% y hasta de un 75\% en reducción de consumo de media, comparado con la malla 2D o malla de 2 dimensiones. Para la siguiente propuesta, la malla concentrada paralela (PC-Mesh), el beneficio en prestaciones que se obtiene es de hasta un 20\%, así cómo de un 60\% en reducción deCamacho Villanueva, J. (2012). High Performance and Power Efficient On-Chip Network Designs through Multiple Injection Ports [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18235Palanci

    The Design, modeling and simulation of switching fabrics: For an ATM network switch

    Get PDF
    The requirements of today\u27s telecommunication systems to support high bandwidth and added flexibility brought about the expansion of (Asynchronous Transfer Mode) ATM as a new method of high-speed data transmission. Various analytical and simulation methods may be used to estimate the performance of ATM switches. Analytical methods considerably limit the range of parameters to be evaluated due to extensive formulae used and time consuming iterations. They are not as effective for large networks because of excessive computations that do not scale linearly with network size. One the other hand, simulation-based methods allow determining a bigger range of performance parameters in a shorter amount of time even for large networks. A simulation model, however, is more elaborate in terms of implementation. Instead of using formulae to obtain results, it has to operate software or hardware modules requiring a certain amount of effort to create. In this work simulation is accomplished by utilizing the ATM library - an object oriented software tool, which uses software chips for building ATM switches. The distinguishing feature of this approach is cut-through routing realized on the bit level abstraction treating ATM protocol data units, called cells, as groups of 424 bits. The arrival events of cells to the system are not instantaneous contrary to commonly used methods of simulation that consider cells as instant messages. The simulation was run for basic multistage interconnection network types with varying source arrival rate and buffer sizes producing a set of graphs of cell delays, throughput, cell loss probability, and queue sizes. The techniques of rearranging and sorting were considered in the simulation. The results indicate that better performance is always achieved by bringing additional stages of elements to the switching system

    Low-Memory Techniques for Routing and Fault-Tolerance on the Fat-Tree Topology

    Full text link
    Actualmente, los clústeres de PCs están considerados como una alternativa eficiente a la hora de construir supercomputadores en los que miles de nodos de computación se conectan mediante una red de interconexión. La red de interconexión tiene que ser diseñada cuidadosamente, puesto que tiene una gran influencia sobre las prestaciones globales del sistema. Dos de los principales parámetros de diseño de las redes de interconexión son la topología y el encaminamiento. La topología define la interconexión de los elementos de la red entre sí, y entre éstos y los nodos de computación. Por su parte, el encaminamiento define los caminos que siguen los paquetes a través de la red. Las prestaciones han sido tradicionalmente la principal métrica a la hora de evaluar las redes de interconexión. Sin embargo, hoy en día hay que considerar dos métricas adicionales: el coste y la tolerancia a fallos. Las redes de interconexión además de escalar en prestaciones también deben hacerlo en coste. Es decir, no sólo tienen que mantener su productividad conforme aumenta el tamaño de la red, sino que tienen que hacerlo sin incrementar sobremanera su coste. Por otra parte, conforme se incrementa el número de nodos en las máquinas de tipo clúster, la red de interconexión debe crecer en concordancia. Este incremento en el número de elementos de la red de interconexión aumenta la probabilidad de aparición de fallos, y por lo tanto, la tolerancia a fallos es prácticamente obligatoria para las redes de interconexión actuales. Esta tesis se centra en la topología fat-tree, ya que es una de las topologías más comúnmente usadas en los clústeres. El objetivo de esta tesis es aprovechar sus características particulares para proporcionar tolerancia a fallos y un algoritmo de encaminamiento capaz de equilibrar la carga de la red proporcionando una buena solución de compromiso entre las prestaciones y el coste.Gómez Requena, C. (2010). Low-Memory Techniques for Routing and Fault-Tolerance on the Fat-Tree Topology [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8856Palanci
    corecore