136 research outputs found
On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years.
Both links and communication equipment had to adapt in order to provide
a minimum quality of service required for current needs. However, in recent
years, a few factors have prevented commercial off-the-shelf hardware from
being able to keep pace with this growth rate, consequently, some software tools are
struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason,
Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the
most demanding tasks without the need to design an application specific integrated
circuit, this is in part to their flexibility and programmability in the field. Needless to say,
developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle
the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer
networks. We focus on the use of FPGA both in computer network monitoring application
and reliable data transmission at very high-speed. On the other hand, we intend to shed
light on the use of high level synthesis languages and boost FPGA applicability in the
context of computer networks so as to reduce development time and design complexity.
In the first part of the thesis, devoted to computer network monitoring. We take advantage
of the FPGA determinism in order to implement active monitoring probes, which
consist on sending a train of packets which is later used to obtain network parameters.
In this case, the determinism is key to reduce the uncertainty of the measurements.
The results of our experiments show that the FPGA implementations are much more
accurate and more precise than the software counterpart. At the same time, the FPGA
implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms
able to thin cyphered traffic as well as removing duplicate packets. These two algorithms
straightforward in principle, but very useful to help traditional network analysis tools to
cope with their task at higher network speeds. On one hand, processing cyphered traffic
bring little benefits, on the other hand, processing duplicate traffic impacts negatively in
the performance of the software tools.
In the second part of the thesis, devoted to the TCP/IP stack. We explore the current
limitations of reliable data transmission using standard software at very high-speed.
Nowadays, the network is becoming an important bottleneck to fulfill current needs, in
particular in data centers. What is more, in recent years the deployment of 100 Gbit/s
network links has started. Consequently, there has been an increase scrutiny of how
networking functionality is deployed, furthermore, a wide range of approaches are
currently being explored to increase the efficiency of networks and tailor its functionality
to the actual needs of the application at hand. FPGAs arise as the perfect alternative to
deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based
open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs.
Limago not only provides an unprecedented throughput, but also, provides a tiny latency
when compared to the software implementations, at least fifteen times. Limago is a key
contribution in some of the hottest topic at the moment, for instance, network-attached
FPGA and in-network data processing
Impact of Network Sharing in Multi-core Architectures
As commodity components continue to dominate the realm of high-end computing, two hardware trends have emerged as major contributors to this - high-speed networking technologies and multi-core architectures. Communication middleware such as the Message Passing Interface (MPI) use the network technology for communicating between processes that reside on different physical nodes while using shared memory for communicating between processes on different cores within the same node. Thus, two conflicting possibilities arise: (i) with the advent of multi-core architectures, the number of processes that reside on the same physical node and hence share the same physical network can potentially increase significantly resulting in {\em increased} network usage and (ii) given the increase in intra-node shared-memory communication for processes residing on the same node, the network usage can potentially {\em reduce} significantly.
In this paper, we address these two conflicting possibilities and study the behavior of network usage in multi-core environments with sample scientific applications. Specifically, we analyze trends that result in increase or decrease of network usage and derive insights on application performance based on these. We also study the sharing of different resources in the system in multi-core environments and identify the contribution of the network in this mix. Finally, we study different process allocation strategies and analyze their impact on such network sharing
Autonomous Configuration of Network Parameters in Operating Systems using Evolutionary Algorithms
By default, the Linux network stack is not configured for highspeed large
file transfer. The reason behind this is to save memory resources. It is
possible to tune the Linux network stack by increasing the network buffers size
for high-speed networks that connect server systems in order to handle more
network packets. However, there are also several other TCP/IP parameters that
can be tuned in an Operating System (OS). In this paper, we leverage Genetic
Algorithms (GAs) to devise a system which learns from the history of the
network traffic and uses this knowledge to optimize the current performance by
adjusting the parameters. This can be done for a standard Linux kernel using
sysctl or /proc. For a Virtual Machine (VM), virtually any type of OS can be
installed and an image can swiftly be compiled and deployed. By being a
sandboxed environment, risky configurations can be tested without the danger of
harming the system. Different scenarios for network parameter configurations
are thoroughly tested, and an increase of up to 65% throughput speed is
achieved compared to the default Linux configuration.Comment: ACM RACS 201
Recommended from our members
Survey on System I/O Hardware Transactions and Impact on Latency, Throughput, and Other Factors
Computer system I/O has evolved with processor and memory technologies in terms of reducing latency, increasing bandwidth and other factors. As requirements increase for I/O, such as networking, storage, and video, descriptor-based DMA transactions have become more important in high performance systems to move data between I/O adapters and system memory buffers. DMA transactions are done with hardware engines below the software protocol abstraction layers in all systems other than rudimentary embedded controllers. CPUs can switch to other tasks by offloading hardware DMA transfers to the I/O adapters. Each I/O interface has one or more separately instantiated descriptor-based DMA engines optimized for a given I/O port. I/O transactions are optimized by accelerator functions to reduce latency, improve throughput and reduce CPU overhead. This chapter surveys the current state of high-performance I/O architecture advances and explores benefits and limitations. With the proliferation of CPU multi-cores within a system, multi-GB/s ports, and on-die integration of system functions, changes beyond the techniques surveyed may be needed for optimal I/O architecture performance.This is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article/chapter is copyrighted by Elsevier and can be found at: http://www.elsevier.com/books/advances-in-computers/hurson/978-0-12-420232-0.Keywords: memory, controllers, processors, DMA, input/output, latency, power, throughpu
Multimedia Data Flow Traffic Classification Using Intelligent Models Based on Traffic Patterns
[EN] Nowadays, there is high interest in modeling the type of multimedia traffic with the purpose of estimating the network resources required to guarantee the quality delivered to the user. In this work we propose a multimedia traffic classification model based on patterns that allows us to differentiate the type of traffic by using video streaming and network characteristics as input parameters. We show that there is low correlation between network parameters and the delivered video quality. Because of this, in addition to network parameters, we also add video streaming parameters in order to improve the efficiency of our system. Finally, it should be noted that, based on the objective video quality received by the user, we have extracted traffic patterns that we use to perform the development of the classification model.This work has been supported by the Ministerio de Economia y Competitividad in the Programa Estatal de Fomento de la Investigacion Cientifica y Tecnica de Excelencia, Subprograma Estatal de Generacion de Conocimiento within the Project with reference TIN2017-84802-C2-1-P.Canovas Solbes, A.; Jimenez, JM.; Romero Martínez, JO.; Lloret, J. (2018). Multimedia Data Flow Traffic Classification Using Intelligent Models Based on Traffic Patterns. IEEE Network. 32(6):100-107. doi:10.1109/MNET.2018.180012110010732
Integrated System Architectures for High-Performance Internet Servers
Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90845/1/binkert-thesis.pd
A cross-stack, network-centric architectural design for next-generation datacenters
This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity.
This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments
- …