73 research outputs found
Recommended from our members
Reconfigurable Optically Interconnected Systems
With the immense growth of data consumption in today's data centers and high-performance computing systems driven by the constant influx of new applications, the network infrastructure supporting this demand is under increasing pressure to enable higher bandwidth, latency, and flexibility requirements. Optical interconnects, able to support high bandwidth wavelength division multiplexed signals with extreme energy efficiency, have become the basis for long-haul and metro-scale networks around the world, while photonic components are being rapidly integrated within rack and chip-scale systems. However, optical and photonic interconnects are not a direct replacement for electronic-based components. Rather, the integration of optical interconnects with electronic peripherals allows for unique functionalities that can improve the capacity, compute performance and flexibility of current state-of-the-art computing systems. This requires physical layer methodologies for their integration with electronic components, as well as system level control planes that incorporates the optical layer characteristics. This thesis explores various network architectures and the associated control plane, hardware infrastructure, and other supporting software modules needed to integrate silicon photonics and MEMS based optical switching into conventional datacom network systems ranging from intra-data center and high-performance computing systems to the metro-scale layer networks between data centers. In each of these systems, we demonstrate dynamic bandwidth steering and compute resource allocation capabilities to enable significant performance improvements. The key accomplishments of this thesis are as follows.
In Part 1, we present high-performance computing network architectures that integrate silicon photonic switches for optical bandwidth steering, enabling multiple reconfigurable topologies that results in significant system performance improvements. As high-performance systems rely on increased parallelism by scaling up to greater numbers of processor nodes, communication between these nodes grows rapidly and the interconnection network becomes a bottleneck to the overall performance of the system. It has been observed that many scientific applications operating on high-performance computing systems cause highly skewed traffic over the network, congesting only a small percentage of the total available links while other links are underutilized. This mismatch of the traffic and the bandwidth allocation of the physical layer network presents the opportunity to optimize the bandwidth resource utilization of the system by using silicon photonic switches to perform bandwidth steering. This allows the individual processors to perform at their maximum compute potential and thereby improving the overall system performance. We show various testbeds that integrates both microring resonator and Mach-Zehnder based silicon photonic switches within Dragonfly and Fat-Tree topology networks built with conventional
equipment, and demonstrate 30-60% reduction in execution time of real high-performance benchmark applications.
Part 2 presents a flexible network architecture and control plane that enables autonomous bandwidth steering and IT resource provisioning capabilities between metro-scale geographically distributed data centers. It uses a software-defined control plane to autonomously provision both network and IT resources to support different quality of service requirements and optimizes resource utilization under dynamically changing load variations. By actively monitoring both the bandwidth utilization of the network and CPU or memory resources of the end hosts, the control plane autonomously provisions background or dynamic connections with different levels of quality of service using optical MEMS switching, as well as initializing live migrations of virtual machines to consolidate or distribute workload. Together these functionalities provide flexibility and maximize efficiency in processing and transferring data, and enables energy and cost savings by scaling down the system when resources are not needed. An experimental testbed of three data center nodes was built to demonstrate the feasibility of these capabilities.
Part 3 presents Lightbridge, a communications platform specifically designed to provide a more seamless integration between processor nodes and an optically switched network. It addresses some of the crucial issues faced by the works presented in the previous chapters related to optical switching. When optical switches perform switching operations, they change the physical topology of the network, and they lack the capability to buffer packets, resulting in certain optical circuits being unavailable. This prompts the question of whether it is safe to transmit packets by end hosts at any given time. Lightbridge was developed to coordinate switching and routing of optical circuits across the network, by having the processors gain information about the current state of the optical network before transmitting packets, and being able to buffer packets when the optical circuit is not available. This part describes details of Lightbridge which is constituted by a loadable Linux kernel module along with other supporting modifications to the Linux kernel in order to achieve the necessary functionalities
On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years.
Both links and communication equipment had to adapt in order to provide
a minimum quality of service required for current needs. However, in recent
years, a few factors have prevented commercial off-the-shelf hardware from
being able to keep pace with this growth rate, consequently, some software tools are
struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason,
Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the
most demanding tasks without the need to design an application specific integrated
circuit, this is in part to their flexibility and programmability in the field. Needless to say,
developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle
the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer
networks. We focus on the use of FPGA both in computer network monitoring application
and reliable data transmission at very high-speed. On the other hand, we intend to shed
light on the use of high level synthesis languages and boost FPGA applicability in the
context of computer networks so as to reduce development time and design complexity.
In the first part of the thesis, devoted to computer network monitoring. We take advantage
of the FPGA determinism in order to implement active monitoring probes, which
consist on sending a train of packets which is later used to obtain network parameters.
In this case, the determinism is key to reduce the uncertainty of the measurements.
The results of our experiments show that the FPGA implementations are much more
accurate and more precise than the software counterpart. At the same time, the FPGA
implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms
able to thin cyphered traffic as well as removing duplicate packets. These two algorithms
straightforward in principle, but very useful to help traditional network analysis tools to
cope with their task at higher network speeds. On one hand, processing cyphered traffic
bring little benefits, on the other hand, processing duplicate traffic impacts negatively in
the performance of the software tools.
In the second part of the thesis, devoted to the TCP/IP stack. We explore the current
limitations of reliable data transmission using standard software at very high-speed.
Nowadays, the network is becoming an important bottleneck to fulfill current needs, in
particular in data centers. What is more, in recent years the deployment of 100 Gbit/s
network links has started. Consequently, there has been an increase scrutiny of how
networking functionality is deployed, furthermore, a wide range of approaches are
currently being explored to increase the efficiency of networks and tailor its functionality
to the actual needs of the application at hand. FPGAs arise as the perfect alternative to
deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based
open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs.
Limago not only provides an unprecedented throughput, but also, provides a tiny latency
when compared to the software implementations, at least fifteen times. Limago is a key
contribution in some of the hottest topic at the moment, for instance, network-attached
FPGA and in-network data processing
A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research
With traditional networking, users can configure control plane protocols to
match the specific network configuration, but without the ability to
fundamentally change the underlying algorithms. With SDN, the users may provide
their own control plane, that can control network devices through their data
plane APIs. Programmable data planes allow users to define their own data plane
algorithms for network devices including appropriate data plane APIs which may
be leveraged by user-defined SDN control. Thus, programmable data planes and
SDN offer great flexibility for network customization, be it for specialized,
commercial appliances, e.g., in 5G or data center networks, or for rapid
prototyping in industrial and academic research. Programming
protocol-independent packet processors (P4) has emerged as the currently most
widespread abstraction, programming language, and concept for data plane
programming. It is developed and standardized by an open community and it is
supported by various software and hardware platforms. In this paper, we survey
the literature from 2015 to 2020 on data plane programming with P4. Our survey
covers 497 references of which 367 are scientific publications. We organize our
work into two parts. In the first part, we give an overview of data plane
programming models, the programming language, architectures, compilers,
targets, and data plane APIs. We also consider research efforts to advance P4
technology. In the second part, we analyze a large body of literature
considering P4-based applied research. We categorize 241 research papers into
different application domains, summarize their contributions, and extract
prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on
2021-01-2
A Modular Approach to Adaptive Reactive Streaming Systems
The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects
Analyse et amélioration de la qualité de services WEB multimédia et leurs mises en oeuvre sur ordinateur et sur FPGA
Résumé : Les services Web, issus de l’avancée technologique dans le domaine des réseaux informatiques et des dispositifs de télécommunications portables et fixes, occupent une place primordiale dans la vie quotidienne des gens. La demande croissante sur des services Web multimédia (SWM), en particulier, augmente la charge sur les réseaux d’Internet, les fournisseurs de services et les serveurs Web. Cette charge est essentiellement due au fait que les SWM de haute qualité nécessitent des débits de transfert et des tailles de paquets importants. La qualité de service (par définition, telle que vue par l’utilisateur) est influencée par plusieurs facteurs de performance, comme le temps de traitement, le délai de propagation, le temps de réponse, la résolution d’images et l’efficacité de compression.
Le travail décrit dans cette thèse est motivé par la demande continuellement croissante de nouveaux SWM et le besoin de maintenir et d’améliorer la qualité de ces services. Nous nous intéressons tout d’abord à la qualité de services (QdS) des SWM lorsqu’ils sont mis en œuvre sur des ordinateurs, tels que les ordinateurs de bureau ou les portables. Nous commençons par étudier les aspects de compatibilité afin d’obtenir des SWM fonctionnant de manière satisfaisante sur différentes plate-formes. Nous étudions ensuite la QdS des SWM lorsqu’ils sont mis en œuvre selon deux approches différentes, soit le protocole SOAP et le style RESTful. Nous étudions plus particulièrement le taux de compression qui est un des facteurs influençant la QdS.
Après avoir considéré sous différents angles les SWM avec mise en œuvre sur des ordinateurs, nous nous intéressons à la QdS des SWM lorsqu’ils sont mis en œuvre sur FPGA. Nous effectuons alors une étude et une mise en œuvre qui permet d’identifier les avantages à mettre en œuvre des SWM sur FPGA.
Les contributions se définissent en cinq volets comme suit :
1. Nous introduisons des méthodes de création, c’est-à-dire conception et mise en œuvre, de SWM sur des plate-formes logicielles hétérogènes dans différents environnements tels que Windows, OS X et Solaris. Un objectif que nous visons est de proposer une approche permettant d’ajouter de nouveaux SWM tout en garantissant la compatibilité entre les plate-formes, dans le sens où nous identifions les options nous permettant d’offrir un ensemble riche et varié de SWM pouvant fonctionner sur les différentes plate-formes.
2. Nous identifions une liste de paramètres pertinents influençant la QdS des SWM mis en œuvre selon le protocole SOAP et selon le style REST.
3. Nous développons un environnement d’analyse pour quantifier les impacts de chaque paramètre identifié sur la QdS de SWM. Pour cela, nous considérons les SWM mis en œuvre selon le protocole SOAP et aussi selon style REST. Les QdS obtenues avec SOAP et REST sont comparées objectivement. Pour faciliter la comparaison, la même gamme d’images (dans l’analyse de SWM SOAP) a été réutilisée et les mêmes plate-formes logicielles.
4. Nous développons une procédure d’analyse qui permet de déterminer une corrélation entre la dimension d’une image et le taux de compression adéquat. Les résultats obtenus confirment cette contribution propre à cette thèse qui confirme que le taux de compression peut être optimisé lorsque les dimensions de l’image ont la propriété suivante : le rapport entre la longueur et la largeur est égal au nombre d’or connu dans la nature. Trois libraires ont été utilisées à savoir JPEG, JPEG2000 et DjVu.
5. Dans un volet complémentaire aux quatre volets précédents, qui concernent les SWM sur ordinateurs, nous étudions ainsi la conception et la mise en œuvre de SWM sur FPGA. Nous justifions l’option de FPGA en identifiant ses avantages par rapport à deux autres options : ordinateurs et ASICs. Afin de confirmer plusieurs avantages identifiés, un SWM de QdS élevée et de haute performance est créé sur FPGA, en utilisant des outils de conception gratuits, du code ouvert (open-source) et une méthode fondée uniquement sur HDL. Notre approche facilitera l’ajout d’autres modules de gestions et d’orchestration de SWM.
6. La mise à jour et l’adaptation du code open-source et de la documentation du module Ethernet IP Core pour la communication entre le FPGA et le port Ethernet sur la carte Nexys3. Ceci a pour effet de faciliter la mise en œuvre de SWM sur la carte Nexys3. // Abstract : Web services, which are the outcome of the technological advancements in IT networks
and hand-held mobile devices for telecommunications, occupy an important role in our
daily life. The increasing demand on multimedia Web services (MWS), in particular,
augments the load on the Internet, on service providers and Web servers. This load
is mainly due to the fact that the high-quality multimedia Web services necessitate
high data transfer rates and considerable payload sizes. The quality of service (QoS,
by definition as it is perceived by the user) is influenced by several factors, such as
processing time, propagation delay, response time, image resolution and compression
efficacy.
The research work in this thesis is motivated by the persistent demand on new MWS,
and the need to maintain and improve the QoS. Firstly, we focus on the QoS of MWS
when they are implemented on desktop and laptop computers. We start with studying
the compatibility aspects in order to obtain MWS functioning satisfactorily on different
platforms. Secondly, we study the QoS for MWS implemented according to the SOAP
protocol and the RESTful style. In particular, we study the compression rate, which is
one of the pertinent factors influencing the QoS.
Thirdly, after the study of MWS when implemented on computers, we proceed with the
study of QoS of MWS when implemented on hardware, in particular on FPGAs. We
achieved thus comprehensive study and implementations that show and compare the
advantages of MWS on FPGAs.
The contributions of this thesis can be resumed as follows:
1. We introduce methods of design and implementation of MWS on heterogeneous
platforms, such as Windows, OS X and Solaris. One of our objectives is to
propose an approach that facilitates the integration of new MWS while assuring
the compatibility amongst involved platforms. This means that we identify the
options that enable offering a set of rich and various MWS that can run on different
platforms.
2. We determine a list of relevant parameters that influence the QoS of MWS.
3. We build an analysis environment that quantifies the impact of each parameter on
the QoS of MWS implemented on both SOAP protocol and RESTful style. Both
QoS for SOAP and REST are objectively compared. The analysis has been held on
a large scale of different images, which produces a realistic point of view describing
the behaviour of real MWS.
4. We develop an analysis procedure to determine the correlation between the
aspect ratio of an image and its compression ratio. Our results confirm that
the compression ratio can be improved and optimised when the aspect ratio of
iiiiv
an image is close to the golden ratio, which exists in nature. Three libraries of
compression schemes have been used, namely: JPEG, JPEG2000 and DjVu.
5. Complementary to the four contributions mentioned above, which concern the
MWS on computers, we study also the design and implementation of MWS on
FPGA. This is justified by the numerous advantages that are offered by FPGAs,
compared to the other technologies such as computers and ASICs. In order to
highlight the advantages of implementing MWS on FPGA, we developed on FPGA
a MWS of high performance and high level of QoS. To achieve our goal, we utilised
freely available design utilities, open-source code and a method based only on
HDL. This approach is adequate for future extensions and add-on modules for
MWS orchestration
On the simulation and design of manycore CMPs
The progression of Moore’s Law has resulted in both embedded and performance
computing systems which use an ever increasing number of processing cores integrated
in a single chip. Commercial systems are now available which provide hundreds
of cores, and academics have proposed architectures for up to 1024 cores. Embedded
multicores are increasingly popular as it is easier to guarantee hard-realtime constraints
using individual cores dedicated for tasks, than to use traditional time-multiplexed processing.
However, finding the optimal hardware configuration to meet these requirements
at minimum cost requires extensive trial and error approaches to investigate the
design space.
This thesis tackles the problems encountered in the design of these large scale multicore
systems by first addressing the problem of fast, detailed micro-architectural simulation.
Initially addressing embedded systems, this work exploits the lack of hardware
cache-coherence support in many deeply embedded systems to increase the available
parallelism in the simulation. Then, through partitioning the NoC and using packet
counting and cycle skipping reduces the amount of computation required to accurately
model the NoC interconnect. In combination, this enables simulation speeds significantly
higher than the state of the art, while maintaining less error, when compared
to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS
(Million (target) Instructions Per Second), or 110MHz, which is better than typical
FPGA prototypes, and approaching final ASIC production speeds. This is achieved
while maintaining an error of only 2.1%, significantly lower than other similar simulators.
The thesis continues by scaling the simulator past large embedded systems up to
64-1024 core processors, adding support for coherent architectures using the same
packet counting techniques along with low overhead context switching to enable the
simulation of such large systems with stricter synchronisation requirements. The new
interconnect model was partitioned to enable parallel simulation to further improve
simulation speeds in a manner which did not sacrifice any accuracy.
These innovations were leveraged to investigate significant novel energy saving optimisations
to the coherency protocol, processor ISA, and processor micro-architecture.
By introducing a new instruction, with the name wait-on-address, the energy spent during
spin-wait style synchronisation events can be significantly reduced. This functions
by putting the core into a low-power idle state while the cache line of the indicated
address is monitored for coherency action. Upon an update or invalidation (or traditional
timer or external interrupts) the core will resume execution, but the active
energy of running the core pipeline and repeatedly accessing the data and instruction
caches is effectively reduced to static idle power. The thesis also shows that existing
combined software-hardware schemes to track data regions which do not require coherency
can adequately address the directory-associativity problem, and introduces a
new coherency sharer encoding which reduces the energy consumed by sharer invalidations
when sharers are grouped closely together, such as would be the case with a
system running many tasks with a small degree of parallelism in each.
The research concludes by using the extremely fast simulation speeds developed to
produce a large set of training data, collecting various runtime and energy statistics for
a wide range of embedded applications on a huge diverse range of potential MPSoC
designs. This data was used to train a series of machine learning based models which
were then evaluated on their capacity to predict performance characteristics of unseen
workload combinations across the explored MPSoC design space, using only two sample
simulations, with promising results from some of the machine learning techniques.
The models were then used to produce a ranking of predicted performance across the
design space, and on average Random Forest was able to predict the best design within
89% of the runtime performance of the actual best tested design, and better than 93%
of the alternative design space. When predicting for a weighted metric of energy, delay
and area, Random Forest on average produced results within 93% of the optimum
result.
In summary this thesis improves upon the state of the art for cycle accurate multicore
simulation, introduces novel energy saving changes the the ISA and microarchitecture
of future multicore processors, and demonstrates the viability of machine
learning techniques to significantly accelerate the design space exploration required to
bring a new manycore design to market
Arquitectura de un sistema integrado para diseño dirigido por modelos en el contexto de internet de las cosas con aplicaciones en medicina
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 14-10-20222Over the past few years, we have seen how processing and storage architectures become cheaper and more efficient, communication infrastructures become faster and more scalable, and many new ways of interacting with the world around us are being developed. Every day more devices are connected to the network, and the generation of data worldwide is growing exponentially. In this context, the Internet of Things promises to be the new technological revolution, as was the introduction of the network of networks or universal mobile accessibility in tis day...A lo largo de los últimos años hemos visto cómo las arquitecturas de procesamiento y almacenamiento se vuelven más baratas y eficientes, las infraestructuras de comunicación se hacen más rápidas y escalables, y se desarrollan multitud de nuevas formas de interactuar con el mundo que nos rodea. Cada día más dispositivos se conectan a la red, y la generación de datos a nivel mundal está creciendo exponencialmente. En este contexto, el Internet de las cosas promete ser la nueva revolución tecnológica, como en su día lo fue la introducción de la red de redes o la accesibilidad móvil universal...Fac. de InformáticaTRUEunpu
- …