Search CORE

136 research outputs found

On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks

Author: Ruiz Noguera Mario Daniel
Publication venue
Publication date: 24/01/2020
Field of study

Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years. Both links and communication equipment had to adapt in order to provide a minimum quality of service required for current needs. However, in recent years, a few factors have prevented commercial off-the-shelf hardware from being able to keep pace with this growth rate, consequently, some software tools are struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason, Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the most demanding tasks without the need to design an application specific integrated circuit, this is in part to their flexibility and programmability in the field. Needless to say, developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer networks. We focus on the use of FPGA both in computer network monitoring application and reliable data transmission at very high-speed. On the other hand, we intend to shed light on the use of high level synthesis languages and boost FPGA applicability in the context of computer networks so as to reduce development time and design complexity. In the first part of the thesis, devoted to computer network monitoring. We take advantage of the FPGA determinism in order to implement active monitoring probes, which consist on sending a train of packets which is later used to obtain network parameters. In this case, the determinism is key to reduce the uncertainty of the measurements. The results of our experiments show that the FPGA implementations are much more accurate and more precise than the software counterpart. At the same time, the FPGA implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms able to thin cyphered traffic as well as removing duplicate packets. These two algorithms straightforward in principle, but very useful to help traditional network analysis tools to cope with their task at higher network speeds. On one hand, processing cyphered traffic bring little benefits, on the other hand, processing duplicate traffic impacts negatively in the performance of the software tools. In the second part of the thesis, devoted to the TCP/IP stack. We explore the current limitations of reliable data transmission using standard software at very high-speed. Nowadays, the network is becoming an important bottleneck to fulfill current needs, in particular in data centers. What is more, in recent years the deployment of 100 Gbit/s network links has started. Consequently, there has been an increase scrutiny of how networking functionality is deployed, furthermore, a wide range of approaches are currently being explored to increase the efficiency of networks and tailor its functionality to the actual needs of the application at hand. FPGAs arise as the perfect alternative to deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs. Limago not only provides an unprecedented throughput, but also, provides a tiny latency when compared to the software implementations, at least fifteen times. Limago is a key contribution in some of the hottest topic at the moment, for instance, network-attached FPGA and in-network data processing

Biblos-e Archivo

Impact of Network Sharing in Multi-core Architectures

Author: Balaji Pavan
Feng Wu-chun
Narayanaswamy Ganesh
Publication venue
Publication date: 01/03/2008
Field of study

As commodity components continue to dominate the realm of high-end computing, two hardware trends have emerged as major contributors to this - high-speed networking technologies and multi-core architectures. Communication middleware such as the Message Passing Interface (MPI) use the network technology for communicating between processes that reside on different physical nodes while using shared memory for communicating between processes on different cores within the same node. Thus, two conflicting possibilities arise: (i) with the advent of multi-core architectures, the number of processes that reside on the same physical node and hence share the same physical network can potentially increase significantly resulting in {\em increased} network usage and (ii) given the increase in intra-node shared-memory communication for processes residing on the same node, the network usage can potentially {\em reduce} significantly. In this paper, we address these two conflicting possibilities and study the behavior of network usage in multi-core environments with sample scientific applications. Specifically, we analyze trends that result in increase or decrease of network usage and derive insights on application performance based on these. We also study the sharing of different resources in the system in multi-core environments and identify the contribution of the network in this mix. Finally, we study different process allocation strategies and analyze their impact on such network sharing

Computer Science Technical Reports @Virginia Tech

Crossref

Autonomous Configuration of Network Parameters in Operating Systems using Evolutionary Algorithms

Author: Buji A.
Chambers L. D.
Eiben A. E.
Goldberg D.E.
Leitao B. H.
Menon A.
Siemon D.
Publication venue
Publication date: 01/01/2018
Field of study

By default, the Linux network stack is not configured for highspeed large file transfer. The reason behind this is to save memory resources. It is possible to tune the Linux network stack by increasing the network buffers size for high-speed networks that connect server systems in order to handle more network packets. However, there are also several other TCP/IP parameters that can be tuned in an Operating System (OS). In this paper, we leverage Genetic Algorithms (GAs) to devise a system which learns from the history of the network traffic and uses this knowledge to optimize the current performance by adjusting the parameters. This can be done for a standard Linux kernel using sysctl or /proc. For a Virtual Machine (VM), virtually any type of OS can be installed and an image can swiftly be compiled and deployed. By being a sandboxed environment, risky configurations can be tested without the danger of harming the system. Different scenarios for network parameter configurations are thoroughly tested, and an increase of up to 65% throughput speed is achieved compared to the default Linux configuration.Comment: ACM RACS 201

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives

Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Crossref

Impact of Network Sharing in Multi-Core Architectures

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Survey on System I/O Hardware Transactions and Impact on Latency, Throughput, and Other Factors

Author: Larsen Steen
Lee Ben
Publication venue: 'Elsevier BV'
Publication date
Field of study

Computer system I/O has evolved with processor and memory technologies in terms of reducing latency, increasing bandwidth and other factors. As requirements increase for I/O, such as networking, storage, and video, descriptor-based DMA transactions have become more important in high performance systems to move data between I/O adapters and system memory buffers. DMA transactions are done with hardware engines below the software protocol abstraction layers in all systems other than rudimentary embedded controllers. CPUs can switch to other tasks by offloading hardware DMA transfers to the I/O adapters. Each I/O interface has one or more separately instantiated descriptor-based DMA engines optimized for a given I/O port. I/O transactions are optimized by accelerator functions to reduce latency, improve throughput and reduce CPU overhead. This chapter surveys the current state of high-performance I/O architecture advances and explores benefits and limitations. With the proliferation of CPU multi-cores within a system, multi-GB/s ports, and on-die integration of system functions, changes beyond the techniques surveyed may be needed for optimal I/O architecture performance.This is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article/chapter is copyrighted by Elsevier and can be found at: http://www.elsevier.com/books/advances-in-computers/hurson/978-0-12-420232-0.Keywords: memory, controllers, processors, DMA, input/output, latency, power, throughpu

ScholarsArchive@OSU

Multimedia Data Flow Traffic Classification Using Intelligent Models Based on Traffic Patterns

Author: Canovas Solbes Alejandro
Jimenez Jose M.
Lloret Jaime
Romero Martínez José Oscar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

[EN] Nowadays, there is high interest in modeling the type of multimedia traffic with the purpose of estimating the network resources required to guarantee the quality delivered to the user. In this work we propose a multimedia traffic classification model based on patterns that allows us to differentiate the type of traffic by using video streaming and network characteristics as input parameters. We show that there is low correlation between network parameters and the delivered video quality. Because of this, in addition to network parameters, we also add video streaming parameters in order to improve the efficiency of our system. Finally, it should be noted that, based on the objective video quality received by the user, we have extracted traffic patterns that we use to perform the development of the classification model.This work has been supported by the Ministerio de Economia y Competitividad in the Programa Estatal de Fomento de la Investigacion Cientifica y Tecnica de Excelencia, Subprograma Estatal de Generacion de Conocimiento within the Project with reference TIN2017-84802-C2-1-P.Canovas Solbes, A.; Jimenez, JM.; Romero Martínez, JO.; Lloret, J. (2018). Multimedia Data Flow Traffic Classification Using Intelligent Models Based on Traffic Patterns. IEEE Network. 32(6):100-107. doi:10.1109/MNET.2018.180012110010732

RiuNet

Integrated System Architectures for High-Performance Internet Servers

Author: Binkert Nathan Lorenzo
Publication venue
Publication date: 01/01/2006
Field of study

Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90845/1/binkert-thesis.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

POF 2016: 25th International Conference on Plastic Optical Fibres - proceedings

Author
Publication venue: Aston University
Publication date: 01/12/2016
Field of study

Aston Publications Explorer

A cross-stack, network-centric architectural design for next-generation datacenters

Author: Alian Mohammad
Publication venue
Publication date: 01/08/2020
Field of study

This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity. This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments

Illinois Digital Environment for Access to Learning and Scholarship Repository