Search CORE

95 research outputs found

On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks

Author: Ruiz Noguera Mario Daniel
Publication venue
Publication date: 24/01/2020
Field of study

Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years. Both links and communication equipment had to adapt in order to provide a minimum quality of service required for current needs. However, in recent years, a few factors have prevented commercial off-the-shelf hardware from being able to keep pace with this growth rate, consequently, some software tools are struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason, Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the most demanding tasks without the need to design an application specific integrated circuit, this is in part to their flexibility and programmability in the field. Needless to say, developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer networks. We focus on the use of FPGA both in computer network monitoring application and reliable data transmission at very high-speed. On the other hand, we intend to shed light on the use of high level synthesis languages and boost FPGA applicability in the context of computer networks so as to reduce development time and design complexity. In the first part of the thesis, devoted to computer network monitoring. We take advantage of the FPGA determinism in order to implement active monitoring probes, which consist on sending a train of packets which is later used to obtain network parameters. In this case, the determinism is key to reduce the uncertainty of the measurements. The results of our experiments show that the FPGA implementations are much more accurate and more precise than the software counterpart. At the same time, the FPGA implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms able to thin cyphered traffic as well as removing duplicate packets. These two algorithms straightforward in principle, but very useful to help traditional network analysis tools to cope with their task at higher network speeds. On one hand, processing cyphered traffic bring little benefits, on the other hand, processing duplicate traffic impacts negatively in the performance of the software tools. In the second part of the thesis, devoted to the TCP/IP stack. We explore the current limitations of reliable data transmission using standard software at very high-speed. Nowadays, the network is becoming an important bottleneck to fulfill current needs, in particular in data centers. What is more, in recent years the deployment of 100 Gbit/s network links has started. Consequently, there has been an increase scrutiny of how networking functionality is deployed, furthermore, a wide range of approaches are currently being explored to increase the efficiency of networks and tailor its functionality to the actual needs of the application at hand. FPGAs arise as the perfect alternative to deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs. Limago not only provides an unprecedented throughput, but also, provides a tiny latency when compared to the software implementations, at least fifteen times. Limago is a key contribution in some of the hottest topic at the moment, for instance, network-attached FPGA and in-network data processing

Biblos-e Archivo

TCP/IP kiihdytys pilvipohjaisessa mobiiliverkossa

Author: Kapanen Ville
Publication venue
Publication date: 07/11/2018
Field of study

Mobile traffic rates are in constant growth. The currently used technology, long-term evolution (LTE), is already in a mature state and receives only small incremental improvements. However, a new major paradigm shift is needed to support future development. Together with the transition to the fifth generation of mobile telecommunications, companies are moving towards network function virtualization (NFV). By decoupling network functions from the hardware it is possible to achieve lower development and management costs as well as better scalability. Major change from dedicated hardware to the cloud does not take place without issues. One key challenge is building a telecommunications-grade ultra-low-latency and low-jitter data storage for call session data. Once overcome, it enables new ways to build much simpler stateless radio applications. There are many technologies which can be used to achieve lower latencies in the cloud infrastructure. In the future, technologies such as memory-centric computing can revolutionize the whole infrastructure and provide nanosecond latencies. However, on the short term, viable solutions are purely software-based. Examples of these are databases and transport layer protocols optimized for latency. Traffic processing can also be accelerated by using libraries and drivers such as the Data Plane Development Kit (DPDK). However, DPDK does not have transport layer support, so additional frameworks are needed to unleash the potential of Transmission Control Protocol/Internet Protocol (TCP/IP) acceleration. In this thesis TCP/IP acceleration is studied as a method for providing ultra-low-latency and low-jitter communications for call session data storage. Two major frameworks -- namely, VPP and F-Stack -- were selected for evaluation. The major finding is that the frameworks are not as mature as expected, and thus they failed to deliver production-ready performance. Building robust interface for applications to use was recognized as a common problem in the market.Mobiiliverkon datamäärät ovat jatkuvassa nousussa. Nykyisin käytössä olevaa neljännen sukupolven matkapuhelintekniikkaan (4G) tehdään enää pieniä päivityksiä. Jotta tulevaisuuden datamääriin pystytään vastaamaan, täytyy tekniikan ottaa seuraava suuri harppaus. Siirryttäessä viidennen sukupolven matkapuhelintekniikkaan (5G), siirtyvät yritykset myös kohti verkon funktioiden virtualisointia. Erottamalla verkon funktiot laitteistosta pystytään saavuttamaan entistä matalammat kehitys- ja hallintakustannukset, sekä parempi skaalautuvuus. Siirtymä erityislaitteistosta pilveen on haasteellinen. Yksi keskeisimmistä ongelmista on matalaan ja tasaiseen viiveeseen pystyvän tietovaraston rakentaminen yhteyksien käsittelyyn. Jos tähän haasteeseen pystytään vastaamaan, mahdollistaa se uudenlaisten yksinkertaisten tilattomien radioapplikaatioiden suunnittelun. Matalaa viivettä pystytään tavoittelemaan monella tapaa. Tulevaisuudessa muistikeskeinen laskenta saattaa mullistaa koko infrastruktuurin ja mahdollistaa nanosekuntien viiveet. Tämä ei kuitenkaan ole mahdollista lyhyellä mittakaavalla, joten ratkaisuja pitää etsiä ohjelmistoratkaisuista. Tämä tarkoittaa esimerkiksi tietokannan tai kuljetuskerroksen protokollan optimointia viivettä ajatellen. Lii-kenteen prosessointia voi myös kiihdyttää erilaisilla kirjastoilla ja ajureilla, kuten Data plane development kitillä (DPDK). DPDK ei tue kuljetuskerroksen kiihdytystä, joten tähän joudutaan käyttämään erillisiä ohjelmistoja. Tässä diplomityössä tutkitaan, pystyvätkö TCP/IP-kiihdytystä tarjoavat ohjelmointikehykset lyhentämään viivettä riittävästi yhteyksien tilannedatan varastoinnin tarpeisiin. Kahden yleisimmän ohjelmiston, VPP ja F-Stack, suorituskyky mitataan. Tutkimuksen tuloksena havaittiin, että kumpikaan ohjelmisto ei ole riit-tävän valmis tuotantokäyttöön. Yhteinen ongelma kaikissa tutkituissa ohjelmistoissa oli rajapinta, jota tarjotaan applikaation käytettäväksi

Aaltodoc Publication Archive

Multilayer Environment and Toolchain for Holistic NetwOrk Design and Analysis

Author: Aroua Achraf
Carle Georg
Glas Kilian
Leonhardt Tizian
Rezabek Filip
von Seck Richard
Publication venue
Publication date: 26/10/2023
Field of study

The recent developments and research in distributed ledger technologies and blockchain have contributed to the increasing adoption of distributed systems. To collect relevant insights into systems' behavior, we observe many evaluation frameworks focusing mainly on the system under test throughput. However, these frameworks often need more comprehensiveness and generality, particularly in adopting a distributed applications' cross-layer approach. This work analyses in detail the requirements for distributed systems assessment. We summarize these findings into a structured methodology and experimentation framework called METHODA. Our approach emphasizes setting up and assessing a broader spectrum of distributed systems and addresses a notable research gap. We showcase the effectiveness of the framework by evaluating four distinct systems and their interaction, leveraging a diverse set of eight carefully selected metrics and 12 essential parameters. Through experimentation and analysis we demonstrate the framework's capabilities to provide valuable insights across various use cases. For instance, we identify that a combination of Trusted Execution Environments with threshold signature scheme FROST introduces minimal overhead on the performance with average latency around \SI{40}{\ms}. We showcase an emulation of realistic systems behavior, e.g., Maximal Extractable Value is possible and could be used to further model such dynamics. The METHODA framework enables a deeper understanding of distributed systems and is a powerful tool for researchers and practitioners navigating the complex landscape of modern computing infrastructures

arXiv.org e-Print Archive

Recommended from our members

Survey on System I/O Hardware Transactions and Impact on Latency, Throughput, and Other Factors

Author: Larsen Steen
Lee Ben
Publication venue: 'Elsevier BV'
Publication date
Field of study

Computer system I/O has evolved with processor and memory technologies in terms of reducing latency, increasing bandwidth and other factors. As requirements increase for I/O, such as networking, storage, and video, descriptor-based DMA transactions have become more important in high performance systems to move data between I/O adapters and system memory buffers. DMA transactions are done with hardware engines below the software protocol abstraction layers in all systems other than rudimentary embedded controllers. CPUs can switch to other tasks by offloading hardware DMA transfers to the I/O adapters. Each I/O interface has one or more separately instantiated descriptor-based DMA engines optimized for a given I/O port. I/O transactions are optimized by accelerator functions to reduce latency, improve throughput and reduce CPU overhead. This chapter surveys the current state of high-performance I/O architecture advances and explores benefits and limitations. With the proliferation of CPU multi-cores within a system, multi-GB/s ports, and on-die integration of system functions, changes beyond the techniques surveyed may be needed for optimal I/O architecture performance.This is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article/chapter is copyrighted by Elsevier and can be found at: http://www.elsevier.com/books/advances-in-computers/hurson/978-0-12-420232-0.Keywords: memory, controllers, processors, DMA, input/output, latency, power, throughpu

ScholarsArchive@OSU

ACCL+: an FPGA-Based Collective Engine for Distributed Applications

Author: Alonso Gustavo
Blott Michaela
He Zhenhao
Korolija Dario
Laan Tristan
Petrica Lucian
Ramhorst Benjamin
Zhu Yu
Publication venue
Publication date: 18/12/2023
Field of study

FPGAs are increasingly prevalent in cloud deployments, serving as Smart NICs or network-attached accelerators. Despite their potential, developing distributed FPGA-accelerated applications remains cumbersome due to the lack of appropriate infrastructure and communication abstractions. To facilitate the development of distributed applications with FPGAs, in this paper we propose ACCL+, an open-source versatile FPGA-based collective communication library. Portable across different platforms and supporting UDP, TCP, as well as RDMA, ACCL+ empowers FPGA applications to initiate direct FPGA-to-FPGA collective communication. Additionally, it can serve as a collective offload engine for CPU applications, freeing the CPU from networking tasks. It is user-extensible, allowing new collectives to be implemented and deployed without having to re-synthesize the FPGA circuit. We evaluated ACCL+ on an FPGA cluster with 100 Gb/s networking, comparing its performance against software MPI over RDMA. The results demonstrate ACCL+'s significant advantages for FPGA-based distributed applications and highly competitive performance for CPU applications. We showcase ACCL+'s dual role with two use cases: seamlessly integrating as a collective offload engine to distribute CPU-based vector-matrix multiplication, and serving as a crucial and efficient component in designing fully FPGA-based distributed deep-learning recommendation inference

arXiv.org e-Print Archive

{FlexTOE}: {F}lexible {TCP} Offload with Fine-Grained Parallelism

Author: Kaufmann A.
Peter S.
Shashidhara R.
Stamler T.
Publication venue
Publication date: 01/01/2022
Field of study

MPG.PuRe

Enabling the use of embedded and mobile technologies for high-performance computing

Author: Rajović Nikola
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in High-Performance Computing(HPC). This transformation has been so effective that the November 2016 TOP500 list is still dominated by x86 architecture. In 2016, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smartphones andtablets, most of which are built with ARM-based Systems on Chips (SoC). This suggests that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC. This thesis addresses this question in detail.We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. Through development of real system prototypes and their performance analysis we assess the feasibility of building an HPCsystem based on mobile SoCs. Through simulation of the future mobile SoC, we identify the missing features and suggest improvements that would enable theuse of future mobile SoCs in HPC environment. Thus, we present design guidelines for future generations mobile SoCs, and HPC systems built around them, enabling the newclass of cheap supercomputers.A finales de la década de los 90, razones económicas llevaron a la adopción de procesadores de uso general en sistemas de Computación de Altas Prestaciones (HPC). Esta transformación ha sido tan efectiva que la lista TOP500 de noviembre de 2016 sigue aun dominada por la arquitectura x86. En 2016, el mayor mercado de productos básicos en computación no son los ordenadores de sobremesa o los servidores, sino la computación móvil, que incluye teléfonos inteligentes y tabletas, la mayoría de los cuales están construidos con sistemas en chip(SoC) de arquitectura ARM. Esto sugiere que una vez que los SoC móviles ofrezcan un rendimiento suficiente, podrán utilizarse para reducir el costo desistemas HPC. Esta tesis aborda esta cuestión en detalle. Analizamos la tendencia del rendimiento de los SoC para móvil, comparándola con la tendencia similar ocurrida en los añosnoventa. A través del desarrollo de prototipos de sistemas reales y su análisis de rendimiento, evaluamos la factibilidad de construir unsistema HPC basado en SoCs móviles. A través de la simulación de SoCs móviles futuros, identificamos las características que faltan y sugerimos mejoras quepermitirían su uso en entornos HPC. Por lo tanto, presentamos directrices de diseño para futuras generaciones de SoCs móviles y sistemas HPC construidos a sualrededor, para permitir la construcción de una nueva clase de supercomputadores de coste reducido

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

NetFPGA: status, uses, developments, challenges, and evaluation

Author: Cortes Cabezas Albeiro
Guerrero Santander César
Salcedo Dixon
Publication venue: 'Corporation Universidad de la Costa, CUC'
Publication date: 01/01/2020
Field of study

The constant growth of the Internet, driven by the demand for timely access to data center networks; has meant that the technological platforms necessary to achieve this purpose are outside the current budgets. In this order to make and validate relevant, timely and relevant contributions; it is necessary that a wider community, access to evaluation, experimentation and demonstration environments with specifications that can be compared with existing networking solutions. This article introduces the NetFPGA, which is a platform to develop network hardware for reconfigurable and rapid prototyping. It’s introduces the application areas in high-performance networks, advantages for traffic analysis, packet flow, hardware acceleration, power consumption and parallel processing in real time. Likewise, it presents the advantages of the platform for research, education, innovation, and future trends of this platform. Finally, we present a performance evaluation of the tool called OSNT (Open-Source Network Tester) and shows that OSNT has 95% accuracy of timestamp with resolution of 10ns for the generation of TCP traffic, and 90% efficiency capturing packets at 10Gbps of full line-rate

Repositorio Digital CUC

A cross-stack, network-centric architectural design for next-generation datacenters

Author: Alian Mohammad
Publication venue
Publication date: 01/08/2020
Field of study

This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity. This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments

Illinois Digital Environment for Access to Learning and Scholarship Repository