73 research outputs found

    On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years. Both links and communication equipment had to adapt in order to provide a minimum quality of service required for current needs. However, in recent years, a few factors have prevented commercial off-the-shelf hardware from being able to keep pace with this growth rate, consequently, some software tools are struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason, Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the most demanding tasks without the need to design an application specific integrated circuit, this is in part to their flexibility and programmability in the field. Needless to say, developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer networks. We focus on the use of FPGA both in computer network monitoring application and reliable data transmission at very high-speed. On the other hand, we intend to shed light on the use of high level synthesis languages and boost FPGA applicability in the context of computer networks so as to reduce development time and design complexity. In the first part of the thesis, devoted to computer network monitoring. We take advantage of the FPGA determinism in order to implement active monitoring probes, which consist on sending a train of packets which is later used to obtain network parameters. In this case, the determinism is key to reduce the uncertainty of the measurements. The results of our experiments show that the FPGA implementations are much more accurate and more precise than the software counterpart. At the same time, the FPGA implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms able to thin cyphered traffic as well as removing duplicate packets. These two algorithms straightforward in principle, but very useful to help traditional network analysis tools to cope with their task at higher network speeds. On one hand, processing cyphered traffic bring little benefits, on the other hand, processing duplicate traffic impacts negatively in the performance of the software tools. In the second part of the thesis, devoted to the TCP/IP stack. We explore the current limitations of reliable data transmission using standard software at very high-speed. Nowadays, the network is becoming an important bottleneck to fulfill current needs, in particular in data centers. What is more, in recent years the deployment of 100 Gbit/s network links has started. Consequently, there has been an increase scrutiny of how networking functionality is deployed, furthermore, a wide range of approaches are currently being explored to increase the efficiency of networks and tailor its functionality to the actual needs of the application at hand. FPGAs arise as the perfect alternative to deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs. Limago not only provides an unprecedented throughput, but also, provides a tiny latency when compared to the software implementations, at least fifteen times. Limago is a key contribution in some of the hottest topic at the moment, for instance, network-attached FPGA and in-network data processing

    A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

    Full text link
    With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on 2021-01-2

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

    Analyse et amélioration de la qualité de services WEB multimédia et leurs mises en oeuvre sur ordinateur et sur FPGA

    Get PDF
    Résumé : Les services Web, issus de l’avancée technologique dans le domaine des réseaux informatiques et des dispositifs de télécommunications portables et fixes, occupent une place primordiale dans la vie quotidienne des gens. La demande croissante sur des services Web multimédia (SWM), en particulier, augmente la charge sur les réseaux d’Internet, les fournisseurs de services et les serveurs Web. Cette charge est essentiellement due au fait que les SWM de haute qualité nécessitent des débits de transfert et des tailles de paquets importants. La qualité de service (par définition, telle que vue par l’utilisateur) est influencée par plusieurs facteurs de performance, comme le temps de traitement, le délai de propagation, le temps de réponse, la résolution d’images et l’efficacité de compression. Le travail décrit dans cette thèse est motivé par la demande continuellement croissante de nouveaux SWM et le besoin de maintenir et d’améliorer la qualité de ces services. Nous nous intéressons tout d’abord à la qualité de services (QdS) des SWM lorsqu’ils sont mis en œuvre sur des ordinateurs, tels que les ordinateurs de bureau ou les portables. Nous commençons par étudier les aspects de compatibilité afin d’obtenir des SWM fonctionnant de manière satisfaisante sur différentes plate-formes. Nous étudions ensuite la QdS des SWM lorsqu’ils sont mis en œuvre selon deux approches différentes, soit le protocole SOAP et le style RESTful. Nous étudions plus particulièrement le taux de compression qui est un des facteurs influençant la QdS. Après avoir considéré sous différents angles les SWM avec mise en œuvre sur des ordinateurs, nous nous intéressons à la QdS des SWM lorsqu’ils sont mis en œuvre sur FPGA. Nous effectuons alors une étude et une mise en œuvre qui permet d’identifier les avantages à mettre en œuvre des SWM sur FPGA. Les contributions se définissent en cinq volets comme suit : 1. Nous introduisons des méthodes de création, c’est-à-dire conception et mise en œuvre, de SWM sur des plate-formes logicielles hétérogènes dans différents environnements tels que Windows, OS X et Solaris. Un objectif que nous visons est de proposer une approche permettant d’ajouter de nouveaux SWM tout en garantissant la compatibilité entre les plate-formes, dans le sens où nous identifions les options nous permettant d’offrir un ensemble riche et varié de SWM pouvant fonctionner sur les différentes plate-formes. 2. Nous identifions une liste de paramètres pertinents influençant la QdS des SWM mis en œuvre selon le protocole SOAP et selon le style REST. 3. Nous développons un environnement d’analyse pour quantifier les impacts de chaque paramètre identifié sur la QdS de SWM. Pour cela, nous considérons les SWM mis en œuvre selon le protocole SOAP et aussi selon style REST. Les QdS obtenues avec SOAP et REST sont comparées objectivement. Pour faciliter la comparaison, la même gamme d’images (dans l’analyse de SWM SOAP) a été réutilisée et les mêmes plate-formes logicielles. 4. Nous développons une procédure d’analyse qui permet de déterminer une corrélation entre la dimension d’une image et le taux de compression adéquat. Les résultats obtenus confirment cette contribution propre à cette thèse qui confirme que le taux de compression peut être optimisé lorsque les dimensions de l’image ont la propriété suivante : le rapport entre la longueur et la largeur est égal au nombre d’or connu dans la nature. Trois libraires ont été utilisées à savoir JPEG, JPEG2000 et DjVu. 5. Dans un volet complémentaire aux quatre volets précédents, qui concernent les SWM sur ordinateurs, nous étudions ainsi la conception et la mise en œuvre de SWM sur FPGA. Nous justifions l’option de FPGA en identifiant ses avantages par rapport à deux autres options : ordinateurs et ASICs. Afin de confirmer plusieurs avantages identifiés, un SWM de QdS élevée et de haute performance est créé sur FPGA, en utilisant des outils de conception gratuits, du code ouvert (open-source) et une méthode fondée uniquement sur HDL. Notre approche facilitera l’ajout d’autres modules de gestions et d’orchestration de SWM. 6. La mise à jour et l’adaptation du code open-source et de la documentation du module Ethernet IP Core pour la communication entre le FPGA et le port Ethernet sur la carte Nexys3. Ceci a pour effet de faciliter la mise en œuvre de SWM sur la carte Nexys3. // Abstract : Web services, which are the outcome of the technological advancements in IT networks and hand-held mobile devices for telecommunications, occupy an important role in our daily life. The increasing demand on multimedia Web services (MWS), in particular, augments the load on the Internet, on service providers and Web servers. This load is mainly due to the fact that the high-quality multimedia Web services necessitate high data transfer rates and considerable payload sizes. The quality of service (QoS, by definition as it is perceived by the user) is influenced by several factors, such as processing time, propagation delay, response time, image resolution and compression efficacy. The research work in this thesis is motivated by the persistent demand on new MWS, and the need to maintain and improve the QoS. Firstly, we focus on the QoS of MWS when they are implemented on desktop and laptop computers. We start with studying the compatibility aspects in order to obtain MWS functioning satisfactorily on different platforms. Secondly, we study the QoS for MWS implemented according to the SOAP protocol and the RESTful style. In particular, we study the compression rate, which is one of the pertinent factors influencing the QoS. Thirdly, after the study of MWS when implemented on computers, we proceed with the study of QoS of MWS when implemented on hardware, in particular on FPGAs. We achieved thus comprehensive study and implementations that show and compare the advantages of MWS on FPGAs. The contributions of this thesis can be resumed as follows: 1. We introduce methods of design and implementation of MWS on heterogeneous platforms, such as Windows, OS X and Solaris. One of our objectives is to propose an approach that facilitates the integration of new MWS while assuring the compatibility amongst involved platforms. This means that we identify the options that enable offering a set of rich and various MWS that can run on different platforms. 2. We determine a list of relevant parameters that influence the QoS of MWS. 3. We build an analysis environment that quantifies the impact of each parameter on the QoS of MWS implemented on both SOAP protocol and RESTful style. Both QoS for SOAP and REST are objectively compared. The analysis has been held on a large scale of different images, which produces a realistic point of view describing the behaviour of real MWS. 4. We develop an analysis procedure to determine the correlation between the aspect ratio of an image and its compression ratio. Our results confirm that the compression ratio can be improved and optimised when the aspect ratio of iiiiv an image is close to the golden ratio, which exists in nature. Three libraries of compression schemes have been used, namely: JPEG, JPEG2000 and DjVu. 5. Complementary to the four contributions mentioned above, which concern the MWS on computers, we study also the design and implementation of MWS on FPGA. This is justified by the numerous advantages that are offered by FPGAs, compared to the other technologies such as computers and ASICs. In order to highlight the advantages of implementing MWS on FPGA, we developed on FPGA a MWS of high performance and high level of QoS. To achieve our goal, we utilised freely available design utilities, open-source code and a method based only on HDL. This approach is adequate for future extensions and add-on modules for MWS orchestration

    High Performance Network Evaluation and Testing

    Get PDF

    On the simulation and design of manycore CMPs

    Get PDF
    The progression of Moore’s Law has resulted in both embedded and performance computing systems which use an ever increasing number of processing cores integrated in a single chip. Commercial systems are now available which provide hundreds of cores, and academics have proposed architectures for up to 1024 cores. Embedded multicores are increasingly popular as it is easier to guarantee hard-realtime constraints using individual cores dedicated for tasks, than to use traditional time-multiplexed processing. However, finding the optimal hardware configuration to meet these requirements at minimum cost requires extensive trial and error approaches to investigate the design space. This thesis tackles the problems encountered in the design of these large scale multicore systems by first addressing the problem of fast, detailed micro-architectural simulation. Initially addressing embedded systems, this work exploits the lack of hardware cache-coherence support in many deeply embedded systems to increase the available parallelism in the simulation. Then, through partitioning the NoC and using packet counting and cycle skipping reduces the amount of computation required to accurately model the NoC interconnect. In combination, this enables simulation speeds significantly higher than the state of the art, while maintaining less error, when compared to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS (Million (target) Instructions Per Second), or 110MHz, which is better than typical FPGA prototypes, and approaching final ASIC production speeds. This is achieved while maintaining an error of only 2.1%, significantly lower than other similar simulators. The thesis continues by scaling the simulator past large embedded systems up to 64-1024 core processors, adding support for coherent architectures using the same packet counting techniques along with low overhead context switching to enable the simulation of such large systems with stricter synchronisation requirements. The new interconnect model was partitioned to enable parallel simulation to further improve simulation speeds in a manner which did not sacrifice any accuracy. These innovations were leveraged to investigate significant novel energy saving optimisations to the coherency protocol, processor ISA, and processor micro-architecture. By introducing a new instruction, with the name wait-on-address, the energy spent during spin-wait style synchronisation events can be significantly reduced. This functions by putting the core into a low-power idle state while the cache line of the indicated address is monitored for coherency action. Upon an update or invalidation (or traditional timer or external interrupts) the core will resume execution, but the active energy of running the core pipeline and repeatedly accessing the data and instruction caches is effectively reduced to static idle power. The thesis also shows that existing combined software-hardware schemes to track data regions which do not require coherency can adequately address the directory-associativity problem, and introduces a new coherency sharer encoding which reduces the energy consumed by sharer invalidations when sharers are grouped closely together, such as would be the case with a system running many tasks with a small degree of parallelism in each. The research concludes by using the extremely fast simulation speeds developed to produce a large set of training data, collecting various runtime and energy statistics for a wide range of embedded applications on a huge diverse range of potential MPSoC designs. This data was used to train a series of machine learning based models which were then evaluated on their capacity to predict performance characteristics of unseen workload combinations across the explored MPSoC design space, using only two sample simulations, with promising results from some of the machine learning techniques. The models were then used to produce a ranking of predicted performance across the design space, and on average Random Forest was able to predict the best design within 89% of the runtime performance of the actual best tested design, and better than 93% of the alternative design space. When predicting for a weighted metric of energy, delay and area, Random Forest on average produced results within 93% of the optimum result. In summary this thesis improves upon the state of the art for cycle accurate multicore simulation, introduces novel energy saving changes the the ISA and microarchitecture of future multicore processors, and demonstrates the viability of machine learning techniques to significantly accelerate the design space exploration required to bring a new manycore design to market

    Arquitectura de un sistema integrado para diseño dirigido por modelos en el contexto de internet de las cosas con aplicaciones en medicina

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 14-10-20222Over the past few years, we have seen how processing and storage architectures become cheaper and more efficient, communication infrastructures become faster and more scalable, and many new ways of interacting with the world around us are being developed. Every day more devices are connected to the network, and the generation of data worldwide is growing exponentially. In this context, the Internet of Things promises to be the new technological revolution, as was the introduction of the network of networks or universal mobile accessibility in tis day...A lo largo de los últimos años hemos visto cómo las arquitecturas de procesamiento y almacenamiento se vuelven más baratas y eficientes, las infraestructuras de comunicación se hacen más rápidas y escalables, y se desarrollan multitud de nuevas formas de interactuar con el mundo que nos rodea. Cada día más dispositivos se conectan a la red, y la generación de datos a nivel mundal está creciendo exponencialmente. En este contexto, el Internet de las cosas promete ser la nueva revolución tecnológica, como en su día lo fue la introducción de la red de redes o la accesibilidad móvil universal...Fac. de InformáticaTRUEunpu