Search CORE

73 research outputs found

Recommended from our members

Reconfigurable Optically Interconnected Systems

Author: Shen Yiwen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

With the immense growth of data consumption in today's data centers and high-performance computing systems driven by the constant influx of new applications, the network infrastructure supporting this demand is under increasing pressure to enable higher bandwidth, latency, and flexibility requirements. Optical interconnects, able to support high bandwidth wavelength division multiplexed signals with extreme energy efficiency, have become the basis for long-haul and metro-scale networks around the world, while photonic components are being rapidly integrated within rack and chip-scale systems. However, optical and photonic interconnects are not a direct replacement for electronic-based components. Rather, the integration of optical interconnects with electronic peripherals allows for unique functionalities that can improve the capacity, compute performance and flexibility of current state-of-the-art computing systems. This requires physical layer methodologies for their integration with electronic components, as well as system level control planes that incorporates the optical layer characteristics. This thesis explores various network architectures and the associated control plane, hardware infrastructure, and other supporting software modules needed to integrate silicon photonics and MEMS based optical switching into conventional datacom network systems ranging from intra-data center and high-performance computing systems to the metro-scale layer networks between data centers. In each of these systems, we demonstrate dynamic bandwidth steering and compute resource allocation capabilities to enable significant performance improvements. The key accomplishments of this thesis are as follows. In Part 1, we present high-performance computing network architectures that integrate silicon photonic switches for optical bandwidth steering, enabling multiple reconfigurable topologies that results in significant system performance improvements. As high-performance systems rely on increased parallelism by scaling up to greater numbers of processor nodes, communication between these nodes grows rapidly and the interconnection network becomes a bottleneck to the overall performance of the system. It has been observed that many scientific applications operating on high-performance computing systems cause highly skewed traffic over the network, congesting only a small percentage of the total available links while other links are underutilized. This mismatch of the traffic and the bandwidth allocation of the physical layer network presents the opportunity to optimize the bandwidth resource utilization of the system by using silicon photonic switches to perform bandwidth steering. This allows the individual processors to perform at their maximum compute potential and thereby improving the overall system performance. We show various testbeds that integrates both microring resonator and Mach-Zehnder based silicon photonic switches within Dragonfly and Fat-Tree topology networks built with conventional equipment, and demonstrate 30-60% reduction in execution time of real high-performance benchmark applications. Part 2 presents a flexible network architecture and control plane that enables autonomous bandwidth steering and IT resource provisioning capabilities between metro-scale geographically distributed data centers. It uses a software-defined control plane to autonomously provision both network and IT resources to support different quality of service requirements and optimizes resource utilization under dynamically changing load variations. By actively monitoring both the bandwidth utilization of the network and CPU or memory resources of the end hosts, the control plane autonomously provisions background or dynamic connections with different levels of quality of service using optical MEMS switching, as well as initializing live migrations of virtual machines to consolidate or distribute workload. Together these functionalities provide flexibility and maximize efficiency in processing and transferring data, and enables energy and cost savings by scaling down the system when resources are not needed. An experimental testbed of three data center nodes was built to demonstrate the feasibility of these capabilities. Part 3 presents Lightbridge, a communications platform specifically designed to provide a more seamless integration between processor nodes and an optically switched network. It addresses some of the crucial issues faced by the works presented in the previous chapters related to optical switching. When optical switches perform switching operations, they change the physical topology of the network, and they lack the capability to buffer packets, resulting in certain optical circuits being unavailable. This prompts the question of whether it is safe to transmit packets by end hosts at any given time. Lightbridge was developed to coordinate switching and routing of optical circuits across the network, by having the processors gain information about the current state of the optical network before transmitting packets, and being able to buffer packets when the optical circuit is not available. This part describes details of Lightbridge which is constituted by a loadable Linux kernel module along with other supporting modifications to the Linux kernel in order to achieve the necessary functionalities

Columbia University Academic Commons

On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks

Author: Ruiz Noguera Mario Daniel
Publication venue
Publication date: 24/01/2020
Field of study

Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years. Both links and communication equipment had to adapt in order to provide a minimum quality of service required for current needs. However, in recent years, a few factors have prevented commercial off-the-shelf hardware from being able to keep pace with this growth rate, consequently, some software tools are struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason, Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the most demanding tasks without the need to design an application specific integrated circuit, this is in part to their flexibility and programmability in the field. Needless to say, developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer networks. We focus on the use of FPGA both in computer network monitoring application and reliable data transmission at very high-speed. On the other hand, we intend to shed light on the use of high level synthesis languages and boost FPGA applicability in the context of computer networks so as to reduce development time and design complexity. In the first part of the thesis, devoted to computer network monitoring. We take advantage of the FPGA determinism in order to implement active monitoring probes, which consist on sending a train of packets which is later used to obtain network parameters. In this case, the determinism is key to reduce the uncertainty of the measurements. The results of our experiments show that the FPGA implementations are much more accurate and more precise than the software counterpart. At the same time, the FPGA implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms able to thin cyphered traffic as well as removing duplicate packets. These two algorithms straightforward in principle, but very useful to help traditional network analysis tools to cope with their task at higher network speeds. On one hand, processing cyphered traffic bring little benefits, on the other hand, processing duplicate traffic impacts negatively in the performance of the software tools. In the second part of the thesis, devoted to the TCP/IP stack. We explore the current limitations of reliable data transmission using standard software at very high-speed. Nowadays, the network is becoming an important bottleneck to fulfill current needs, in particular in data centers. What is more, in recent years the deployment of 100 Gbit/s network links has started. Consequently, there has been an increase scrutiny of how networking functionality is deployed, furthermore, a wide range of approaches are currently being explored to increase the efficiency of networks and tailor its functionality to the actual needs of the application at hand. FPGAs arise as the perfect alternative to deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs. Limago not only provides an unprecedented throughput, but also, provides a tiny latency when compared to the software implementations, at least fifteen times. Limago is a key contribution in some of the hottest topic at the moment, for instance, network-attached FPGA and in-network data processing

Biblos-e Archivo

A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

Author: Frank Reinhard
Gurevich Vladimir
Hauser Frederik
Häberle Marco
Lindner Steffen
Menth Michael
Merling Daniel
Zeiger Florian
Publication venue
Publication date: 26/01/2021
Field of study

With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on 2021-01-2

arXiv.org e-Print Archive

A Modular Approach to Adaptive Reactive Streaming Systems

Author: Neely Christopher E.
Publication venue: Scholar Commons
Publication date: 19/05/2012
Field of study

The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

Scholar Commons - Santa Clara University

Analyse et amélioration de la qualité de services WEB multimédia et leurs mises en oeuvre sur ordinateur et sur FPGA

Author: Al-Canaan Amer
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2014
Field of study

Résumé : Les services Web, issus de l’avancée technologique dans le domaine des réseaux informatiques et des dispositifs de télécommunications portables et fixes, occupent une place primordiale dans la vie quotidienne des gens. La demande croissante sur des services Web multimédia (SWM), en particulier, augmente la charge sur les réseaux d’Internet, les fournisseurs de services et les serveurs Web. Cette charge est essentiellement due au fait que les SWM de haute qualité nécessitent des débits de transfert et des tailles de paquets importants. La qualité de service (par définition, telle que vue par l’utilisateur) est influencée par plusieurs facteurs de performance, comme le temps de traitement, le délai de propagation, le temps de réponse, la résolution d’images et l’efficacité de compression. Le travail décrit dans cette thèse est motivé par la demande continuellement croissante de nouveaux SWM et le besoin de maintenir et d’améliorer la qualité de ces services. Nous nous intéressons tout d’abord à la qualité de services (QdS) des SWM lorsqu’ils sont mis en œuvre sur des ordinateurs, tels que les ordinateurs de bureau ou les portables. Nous commençons par étudier les aspects de compatibilité afin d’obtenir des SWM fonctionnant de manière satisfaisante sur différentes plate-formes. Nous étudions ensuite la QdS des SWM lorsqu’ils sont mis en œuvre selon deux approches différentes, soit le protocole SOAP et le style RESTful. Nous étudions plus particulièrement le taux de compression qui est un des facteurs influençant la QdS. Après avoir considéré sous différents angles les SWM avec mise en œuvre sur des ordinateurs, nous nous intéressons à la QdS des SWM lorsqu’ils sont mis en œuvre sur FPGA. Nous effectuons alors une étude et une mise en œuvre qui permet d’identifier les avantages à mettre en œuvre des SWM sur FPGA. Les contributions se définissent en cinq volets comme suit : 1. Nous introduisons des méthodes de création, c’est-à-dire conception et mise en œuvre, de SWM sur des plate-formes logicielles hétérogènes dans différents environnements tels que Windows, OS X et Solaris. Un objectif que nous visons est de proposer une approche permettant d’ajouter de nouveaux SWM tout en garantissant la compatibilité entre les plate-formes, dans le sens où nous identifions les options nous permettant d’offrir un ensemble riche et varié de SWM pouvant fonctionner sur les différentes plate-formes. 2. Nous identifions une liste de paramètres pertinents influençant la QdS des SWM mis en œuvre selon le protocole SOAP et selon le style REST. 3. Nous développons un environnement d’analyse pour quantifier les impacts de chaque paramètre identifié sur la QdS de SWM. Pour cela, nous considérons les SWM mis en œuvre selon le protocole SOAP et aussi selon style REST. Les QdS obtenues avec SOAP et REST sont comparées objectivement. Pour faciliter la comparaison, la même gamme d’images (dans l’analyse de SWM SOAP) a été réutilisée et les mêmes plate-formes logicielles. 4. Nous développons une procédure d’analyse qui permet de déterminer une corrélation entre la dimension d’une image et le taux de compression adéquat. Les résultats obtenus confirment cette contribution propre à cette thèse qui confirme que le taux de compression peut être optimisé lorsque les dimensions de l’image ont la propriété suivante : le rapport entre la longueur et la largeur est égal au nombre d’or connu dans la nature. Trois libraires ont été utilisées à savoir JPEG, JPEG2000 et DjVu. 5. Dans un volet complémentaire aux quatre volets précédents, qui concernent les SWM sur ordinateurs, nous étudions ainsi la conception et la mise en œuvre de SWM sur FPGA. Nous justifions l’option de FPGA en identifiant ses avantages par rapport à deux autres options : ordinateurs et ASICs. Afin de confirmer plusieurs avantages identifiés, un SWM de QdS élevée et de haute performance est créé sur FPGA, en utilisant des outils de conception gratuits, du code ouvert (open-source) et une méthode fondée uniquement sur HDL. Notre approche facilitera l’ajout d’autres modules de gestions et d’orchestration de SWM. 6. La mise à jour et l’adaptation du code open-source et de la documentation du module Ethernet IP Core pour la communication entre le FPGA et le port Ethernet sur la carte Nexys3. Ceci a pour effet de faciliter la mise en œuvre de SWM sur la carte Nexys3. // Abstract : Web services, which are the outcome of the technological advancements in IT networks and hand-held mobile devices for telecommunications, occupy an important role in our daily life. The increasing demand on multimedia Web services (MWS), in particular, augments the load on the Internet, on service providers and Web servers. This load is mainly due to the fact that the high-quality multimedia Web services necessitate high data transfer rates and considerable payload sizes. The quality of service (QoS, by definition as it is perceived by the user) is influenced by several factors, such as processing time, propagation delay, response time, image resolution and compression efficacy. The research work in this thesis is motivated by the persistent demand on new MWS, and the need to maintain and improve the QoS. Firstly, we focus on the QoS of MWS when they are implemented on desktop and laptop computers. We start with studying the compatibility aspects in order to obtain MWS functioning satisfactorily on different platforms. Secondly, we study the QoS for MWS implemented according to the SOAP protocol and the RESTful style. In particular, we study the compression rate, which is one of the pertinent factors influencing the QoS. Thirdly, after the study of MWS when implemented on computers, we proceed with the study of QoS of MWS when implemented on hardware, in particular on FPGAs. We achieved thus comprehensive study and implementations that show and compare the advantages of MWS on FPGAs. The contributions of this thesis can be resumed as follows: 1. We introduce methods of design and implementation of MWS on heterogeneous platforms, such as Windows, OS X and Solaris. One of our objectives is to propose an approach that facilitates the integration of new MWS while assuring the compatibility amongst involved platforms. This means that we identify the options that enable offering a set of rich and various MWS that can run on different platforms. 2. We determine a list of relevant parameters that influence the QoS of MWS. 3. We build an analysis environment that quantifies the impact of each parameter on the QoS of MWS implemented on both SOAP protocol and RESTful style. Both QoS for SOAP and REST are objectively compared. The analysis has been held on a large scale of different images, which produces a realistic point of view describing the behaviour of real MWS. 4. We develop an analysis procedure to determine the correlation between the aspect ratio of an image and its compression ratio. Our results confirm that the compression ratio can be improved and optimised when the aspect ratio of iiiiv an image is close to the golden ratio, which exists in nature. Three libraries of compression schemes have been used, namely: JPEG, JPEG2000 and DjVu. 5. Complementary to the four contributions mentioned above, which concern the MWS on computers, we study also the design and implementation of MWS on FPGA. This is justified by the numerous advantages that are offered by FPGAs, compared to the other technologies such as computers and ASICs. In order to highlight the advantages of implementing MWS on FPGA, we developed on FPGA a MWS of high performance and high level of QoS. To achieve our goal, we utilised freely available design utilities, open-source code and a method based only on HDL. This approach is adequate for future extensions and add-on modules for MWS orchestration

Savoirs UdeS

Xdense: A Mesh Grid Sensor Network for Extreme Dense Sensing

Author: João Fernandes Loureiro
Publication venue
Publication date: 24/01/2020
Field of study

Repositório Aberto da Universidade do Porto

High Performance Network Evaluation and Testing

Author: Pilimon Artur
Publication venue: DTU - Department of Photonics Engineering
Publication date: 01/01/2018
Field of study

Online Research Database In Technology

On the simulation and design of manycore CMPs

Author: Thompson Christopher Callum
Publication venue: The University of Edinburgh
Publication date: 26/11/2015
Field of study

The progression of Moore’s Law has resulted in both embedded and performance computing systems which use an ever increasing number of processing cores integrated in a single chip. Commercial systems are now available which provide hundreds of cores, and academics have proposed architectures for up to 1024 cores. Embedded multicores are increasingly popular as it is easier to guarantee hard-realtime constraints using individual cores dedicated for tasks, than to use traditional time-multiplexed processing. However, finding the optimal hardware configuration to meet these requirements at minimum cost requires extensive trial and error approaches to investigate the design space. This thesis tackles the problems encountered in the design of these large scale multicore systems by first addressing the problem of fast, detailed micro-architectural simulation. Initially addressing embedded systems, this work exploits the lack of hardware cache-coherence support in many deeply embedded systems to increase the available parallelism in the simulation. Then, through partitioning the NoC and using packet counting and cycle skipping reduces the amount of computation required to accurately model the NoC interconnect. In combination, this enables simulation speeds significantly higher than the state of the art, while maintaining less error, when compared to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS (Million (target) Instructions Per Second), or 110MHz, which is better than typical FPGA prototypes, and approaching final ASIC production speeds. This is achieved while maintaining an error of only 2.1%, significantly lower than other similar simulators. The thesis continues by scaling the simulator past large embedded systems up to 64-1024 core processors, adding support for coherent architectures using the same packet counting techniques along with low overhead context switching to enable the simulation of such large systems with stricter synchronisation requirements. The new interconnect model was partitioned to enable parallel simulation to further improve simulation speeds in a manner which did not sacrifice any accuracy. These innovations were leveraged to investigate significant novel energy saving optimisations to the coherency protocol, processor ISA, and processor micro-architecture. By introducing a new instruction, with the name wait-on-address, the energy spent during spin-wait style synchronisation events can be significantly reduced. This functions by putting the core into a low-power idle state while the cache line of the indicated address is monitored for coherency action. Upon an update or invalidation (or traditional timer or external interrupts) the core will resume execution, but the active energy of running the core pipeline and repeatedly accessing the data and instruction caches is effectively reduced to static idle power. The thesis also shows that existing combined software-hardware schemes to track data regions which do not require coherency can adequately address the directory-associativity problem, and introduces a new coherency sharer encoding which reduces the energy consumed by sharer invalidations when sharers are grouped closely together, such as would be the case with a system running many tasks with a small degree of parallelism in each. The research concludes by using the extremely fast simulation speeds developed to produce a large set of training data, collecting various runtime and energy statistics for a wide range of embedded applications on a huge diverse range of potential MPSoC designs. This data was used to train a series of machine learning based models which were then evaluated on their capacity to predict performance characteristics of unseen workload combinations across the explored MPSoC design space, using only two sample simulations, with promising results from some of the machine learning techniques. The models were then used to produce a ranking of predicted performance across the design space, and on average Random Forest was able to predict the best design within 89% of the runtime performance of the actual best tested design, and better than 93% of the alternative design space. When predicting for a weighted metric of energy, delay and area, Random Forest on average produced results within 93% of the optimum result. In summary this thesis improves upon the state of the art for cycle accurate multicore simulation, introduces novel energy saving changes the the ISA and microarchitecture of future multicore processors, and demonstrates the viability of machine learning techniques to significantly accelerate the design space exploration required to bring a new manycore design to market

Edinburgh Research Archive

Arquitectura de un sistema integrado para diseño dirigido por modelos en el contexto de internet de las cosas con aplicaciones en medicina

Author: Henares Vilaboa Kevin
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 29/03/2022
Field of study

Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 14-10-20222Over the past few years, we have seen how processing and storage architectures become cheaper and more efficient, communication infrastructures become faster and more scalable, and many new ways of interacting with the world around us are being developed. Every day more devices are connected to the network, and the generation of data worldwide is growing exponentially. In this context, the Internet of Things promises to be the new technological revolution, as was the introduction of the network of networks or universal mobile accessibility in tis day...A lo largo de los últimos años hemos visto cómo las arquitecturas de procesamiento y almacenamiento se vuelven más baratas y eficientes, las infraestructuras de comunicación se hacen más rápidas y escalables, y se desarrollan multitud de nuevas formas de interactuar con el mundo que nos rodea. Cada día más dispositivos se conectan a la red, y la generación de datos a nivel mundal está creciendo exponencialmente. En este contexto, el Internet de las cosas promete ser la nueva revolución tecnológica, como en su día lo fue la introducción de la red de redes o la accesibilidad móvil universal...Fac. de InformáticaTRUEunpu

Docta Complutense