Search CORE

82 research outputs found

Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees

Author: De Sensi Daniele
Di Girolamo Salvatore
Hoefler Torsten
Molero Edgar Costa
Vanbever Laurent
Publication venue
Publication date: 28/09/2023
Field of study

The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design Canary, the first congestion-aware in-network allreduce algorithm. Canary uses load balancing algorithms to forward packets on the least congested paths. Because switches do not know from which ports they will receive the data to aggregate, they use timeouts to aggregate the data in a best-effort way. We develop a P4 Canary prototype and evaluate it on a Tofino switch. We then validate Canary through simulations on large networks, showing performance improvements up to 40% compared to the state-of-the-art

arXiv.org e-Print Archive

Cascade: A Platform for Delay-Sensitive Edge Intelligence

Author: Birman Ken
Garrett Thiago
Liu Mingzhao
Merlina Andrea
Rosa Lorenzo
Song Weijia
Tremel Edward
Vitenberg Roman
Yang Yuting
Publication venue
Publication date: 28/11/2023
Field of study

Interactive intelligent computing applications are increasingly prevalent, creating a need for AI/ML platforms optimized to reduce per-event latency while maintaining high throughput and efficient resource management. Yet many intelligent applications run on AI/ML platforms that optimize for high throughput even at the cost of high tail-latency. Cascade is a new AI/ML hosting platform intended to untangle this puzzle. Innovations include a legacy-friendly storage layer that moves data with minimal copying and a "fast path" that collocates data and computation to maximize responsiveness. Our evaluation shows that Cascade reduces latency by orders of magnitude with no loss of throughput.Comment: 14 pages, 12 Figure

arXiv.org e-Print Archive

A Survey of Software-Defined Networks-on-Chip: Motivations, Challenges and Opportunities

Author: Gómez Rodríguez José Ricardo
Ibarra Delgado Salvador
Parra Michel Ramon
Rodríguez Abdalá Viktor Iván
Sandoval Arechiga Remberto
Vázquez Avila José Luis
Publication venue: 'MDPI AG'
Publication date: 12/02/2021
Field of study

Current computing platforms encourage the integration of thousands of processing cores, and their interconnections, into a single chip. Mobile smartphones, IoT, embedded devices, desktops, and data centers use Many-Core Systems-on-Chip (SoCs) to exploit their compute power and parallelism to meet the dynamic workload requirements. Networks-on-Chip (NoCs) lead to scalable connectivity for diverse applications with distinct traffic patterns and data dependencies. However, when the system executes various applications in traditional NoCs—optimized and fixed at synthesis time—the interconnection nonconformity with the different applications’ requirements generates limitations in the performance. In the literature, NoC designs embraced the Software-Defined Networking (SDN) strategy to evolve into an adaptable interconnection solution for future chips. However, the works surveyed implement a partial Software-Defined Network-on-Chip (SDNoC) approach, leaving aside the SDN layered architecture that brings interoperability in conventional networking. This paper explores the SDNoC literature and classifies it regarding the desired SDN features that each work presents. Then, we described the challenges and opportunities detected from the literature survey. Moreover, we explain the motivation for an SDNoC approach, and we expose both SDN and SDNoC concepts and architectures. We observe that works in the literature employed an uncomplete layered SDNoC approach. This fact creates various fertile areas in the SDNoC architecture where researchers may contribute to Many-Core SoCs designs.Las plataformas informáticas actuales fomentan la integración de miles de núcleos de procesamiento y sus interconexiones, en un solo chip. Los smartphones móviles, el IoT, los dispositivos embebidos, los ordenadores de sobremesa y los centros de datos utilizan sistemas en chip (SoC) de muchos núcleos para explotar su potencia de cálculo y paralelismo para satisfacer los requisitos de las cargas de trabajo dinámicas. Las redes en chip (NoC) conducen a una conectividad escalable para diversas aplicaciones con distintos patrones de tráfico y dependencias de datos. Sin embargo, cuando el sistema ejecuta varias aplicaciones en las NoC tradicionales -optimizadas y fijadas en el momento de síntesis, la disconformidad de la interconexión con los requisitos de las distintas aplicaciones genera limitaciones en el rendimiento. En la literatura, los diseños de NoC adoptaron la estrategia de redes definidas por software (SDN) para evolucionar hacia una solución de interconexión adaptable para los futuros chips. Sin embargo, los trabajos estudiados implementan un enfoque parcial de red definida por software en el chip (SDNoC) de SDN, dejando de lado la arquitectura en capas de SDN que aporta interoperabilidad en la red convencional. Este artículo explora la literatura sobre SDNoC y la clasifica en función de las características SDN que presenta cada trabajo. A continuación, describimos los retos y oportunidades detectados a partir del estudio de la literatura. Además, explicamos la motivación para un enfoque SDNoC, y exponemos los conceptos y arquitecturas de SDN y SDNoC. Observamos que los trabajos en la literatura emplean un enfoque SDNoC por capas no completo. Este hecho crea varias áreas fértiles en la arquitectura SDNoC en las que los investigadores pueden contribuir a los diseños de SoCs de muchos núcleos

Caxcan Repositorio Institucional de la Universidad Autónoma de Zacatecas

Roadmapping the Next Generation of Silicon Photonics

Author: Bogaerts Wim
Bowers John E.
Chrostowski Lukas
Hochberg Michael
Shastri Bhavin J.
Shekhar Sudip
Soref Richard
Publication venue
Publication date: 19/01/2024
Field of study

Silicon photonics has developed into a mainstream technology driven by advances in optical communications. The current generation has led to a proliferation of integrated photonic devices from thousands to millions - mainly in the form of communication transceivers for data centers. Products in many exciting applications, such as sensing and computing, are around the corner. What will it take to increase the proliferation of silicon photonics from millions to billions of units shipped? What will the next generation of silicon photonics look like? What are the common threads in the integration and fabrication bottlenecks that silicon photonic applications face, and which emerging technologies can solve them? This perspective article is an attempt to answer such questions. We chart the generational trends in silicon photonics technology, drawing parallels from the generational definitions of CMOS technology. We identify the crucial challenges that must be solved to make giant strides in CMOS-foundry-compatible devices, circuits, integration, and packaging. We identify challenges critical to the next generation of systems and applications - in communication, signal processing, and sensing. By identifying and summarizing such challenges and opportunities, we aim to stimulate further research on devices, circuits, and systems for the silicon photonics ecosystem

arXiv.org e-Print Archive

MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers

Author: Erez Mattan
Kamath Aditya K.
Mansoorshahi Kayvan
Peter Simon
Raybuck Amanda
Zhang Wei
Publication venue
Publication date: 01/12/2023
Field of study

We present MaxMem, a tiered main memory management system that aims to maximize Big Data application colocation and performance. MaxMem uses an application-agnostic and lightweight memory occupancy control mechanism based on fast memory miss ratios to provide application QoS under increasing colocation. By relying on memory access sampling and binning to quickly identify per-process memory heat gradients, MaxMem maximizes performance for many applications sharing tiered main memory simultaneously. MaxMem is designed as a user-space memory manager to be easily modifiable and extensible, without complex kernel code development. On a system with tiered main memory consisting of DRAM and Intel Optane persistent memory modules, our evaluation confirms that MaxMem provides 11% and 38% better throughput and up to 80% and an order of magnitude lower 99th percentile latency than HeMem and Linux AutoNUMA, respectively, with a Big Data key-value store in dynamic colocation scenarios.Comment: 12 pages, 10 figure

arXiv.org e-Print Archive

Traffic generation for benchmarking data centre networks

Author: Benjamin JL
Parsonson CWF
Zervas G
Publication venue: 'Elsevier BV'
Publication date: 01/11/2022
Field of study

Benchmarking is commonly used in research fields, such as computer architecture design and machine learning, as a powerful paradigm for rigorously assessing, comparing, and developing novel technologies. However, the data centre network (DCN) community lacks a standard open-access and reproducible traffic generation framework for benchmark workload generation. Driving factors behind this include the proprietary nature of traffic traces, the limited detail and quantity of open-access network-level data sets, the high cost of real world experimentation, and the poor reproducibility and fidelity of synthetically generated traffic. This is curtailing the community's understanding of existing systems and hindering the ability with which novel technologies, such as optical DCNs, can be developed, compared, and tested. We present TrafPy; an open-access framework for generating both realistic and custom DCN traffic traces. TrafPy is compatible with any simulation, emulation, or experimentation environment, and can be used for standardised benchmarking and for investigating the properties and limitations of network systems such as schedulers, switches, routers, and resource managers. We give an overview of the TrafPy traffic generation framework, and provide a brief demonstration of its efficacy through an investigation into the sensitivity of some canonical scheduling algorithms to varying traffic trace characteristics in the context of optical DCNs. TrafPy is open-sourced via GitHub and all data associated with this manuscript via RDR

UCL Discovery

Intratumoural Delivery of mRNA Loaded on a Cationic Hyper-Branched Cyclodextrin-Based Polymer Induced an Anti-Tumour Immunological Response in Melanoma

Author: Cecone Claudio
Conde João
Hayashi Tomoya
Ishii Ken J
Khazaei Monfared Yousef
Mahmoudian Mohammad
Matencio Adrián
Trotta Francesco
Zakeri-Milani Parvin
Publication venue
Publication date: 24/07/2023
Field of study

Funding text 1 This work is the result of a contract for the University of Turin (Italy) for Training (For Y.K.M.) and for A.M. and a RTDA contract from the D.M 1062/2021 (Ministero dell’Università e della Ricerca) for the University of Turin. This research acknowledges support from the Project CH4.0 under the MIUR program “Dipartimenti di Eccellenza 2023–2027”. J.C. acknowledges that they have a contract with the European Research Council—ERC Starting Grant 848325 for financial support. Funding text 2 This research was partially funded by The Italian Ministry of Enterprises and Made in Italy (project acronym CN-RNA) under the PNRR among the initiatives aimed towards creating an integrated system of research and innovation infrastructures (PNRR M4C2 PROJECTS).mRNA technology has demonstrated potential for use as an effective cancer immunotherapy. However, inefficient in vivo mRNA delivery and the requirements for immune co-stimulation present major hurdles to achieving anti-tumour therapeutic efficacy. Therefore, we used a cationic hyper-branched cyclodextrin-based polymer to increase mRNA delivery in both in vitro and in vivo melanoma cancer. We found that the transfection efficacy of the mRNA-EGFP-loaded Ppoly system was significantly higher than that of lipofectamine and free mRNA in both 2D and 3D melanoma cancer cells; also, this delivery system did not show cytotoxicity. In addition, the biodistribution results revealed time-dependent and significantly higher mEGFP expression in complexes with Ppoly compared to free mRNA. We then checked the anti-tumour effect of intratumourally injected free mRNA-OVA, a foreign antigen, and loaded Ppoly; the results showed a considerable decrease in both tumour size and weight in the group treated with OVA-mRNA in loaded Ppoly compared to other formulations with an efficient adaptive immune response by dramatically increasing most leukocyte subtypes and OVA-specific CD8+ T cells in both the spleen and tumour tissues. Collectively, our findings suggest that the local delivery of cationic cyclodextrin-based polymer complexes containing foreign mRNA antigens might be a good and reliable concept for cancer immunotherapy.publishersversionpublishe

Repositório da Universidade Nova de Lisboa