1,206 research outputs found

    High performance communication on reconfigurable clusters

    Get PDF
    High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface. Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications. We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm. We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud

    Graph-based 3D Collision-distance Estimation Network with Probabilistic Graph Rewiring

    Full text link
    We aim to solve the problem of data-driven collision-distance estimation given 3-dimensional (3D) geometries. Conventional algorithms suffer from low accuracy due to their reliance on limited representations, such as point clouds. In contrast, our previous graph-based model, GraphDistNet, achieves high accuracy using edge information but incurs higher message-passing costs with growing graph size, limiting its applicability to 3D geometries. To overcome these challenges, we propose GDN-R, a novel 3D graph-based estimation network.GDN-R employs a layer-wise probabilistic graph-rewiring algorithm leveraging the differentiable Gumbel-top-K relaxation. Our method accurately infers minimum distances through iterative graph rewiring and updating relevant embeddings. The probabilistic rewiring enables fast and robust embedding with respect to unforeseen categories of geometries. Through 41,412 random benchmark tasks with 150 pairs of 3D objects, we show GDN-R outperforms state-of-the-art baseline methods in terms of accuracy and generalizability. We also show that the proposed rewiring improves the update performance reducing the size of the estimation model. We finally show its batch prediction and auto-differentiation capabilities for trajectory optimization in both simulated and real-world scenarios.Comment: 7 pages, 6 figure

    Adaptive Dispatching of Tasks in the Cloud

    Full text link
    The increasingly wide application of Cloud Computing enables the consolidation of tens of thousands of applications in shared infrastructures. Thus, meeting the quality of service requirements of so many diverse applications in such shared resource environments has become a real challenge, especially since the characteristics and workload of applications differ widely and may change over time. This paper presents an experimental system that can exploit a variety of online quality of service aware adaptive task allocation schemes, and three such schemes are designed and compared. These are a measurement driven algorithm that uses reinforcement learning, secondly a "sensible" allocation algorithm that assigns jobs to sub-systems that are observed to provide a lower response time, and then an algorithm that splits the job arrival stream into sub-streams at rates computed from the hosts' processing capabilities. All of these schemes are compared via measurements among themselves and with a simple round-robin scheduler, on two experimental test-beds with homogeneous and heterogeneous hosts having different processing capacities.Comment: 10 pages, 9 figure

    Management And Security Of Multi-Cloud Applications

    Get PDF
    Single cloud management platform technology has reached maturity and is quite successful in information technology applications. Enterprises and application service providers are increasingly adopting a multi-cloud strategy to reduce the risk of cloud service provider lock-in and cloud blackouts and, at the same time, get the benefits like competitive pricing, the flexibility of resource provisioning and better points of presence. Another class of applications that are getting cloud service providers increasingly interested in is the carriers\u27 virtualized network services. However, virtualized carrier services require high levels of availability and performance and impose stringent requirements on cloud services. They necessitate the use of multi-cloud management and innovative techniques for placement and performance management. We consider two classes of distributed applications – the virtual network services and the next generation of healthcare – that would benefit immensely from deployment over multiple clouds. This thesis deals with the design and development of new processes and algorithms to enable these classes of applications. We have evolved a method for optimization of multi-cloud platforms that will pave the way for obtaining optimized placement for both classes of services. The approach that we have followed for placement itself is predictive cost optimized latency controlled virtual resource placement for both types of applications. To improve the availability of virtual network services, we have made innovative use of the machine and deep learning for developing a framework for fault detection and localization. Finally, to secure patient data flowing through the wide expanse of sensors, cloud hierarchy, virtualized network, and visualization domain, we have evolved hierarchical autoencoder models for data in motion between the IoT domain and the multi-cloud domain and within the multi-cloud hierarchy

    Impact of network interconnection in cloud computing environments for high-performance computing applications

    Get PDF
    The availability of computational resources has changed significantly due to the use of the cloud computing paradigm. Aiming at potential advantages, such as cost savings through the pay-per-use method and scalable/elastic resource allocation, we have witnessed ef forts to execute high-performance computing (HPC) applications in the cloud. Due to the distributed nature of these environments, performance is highly dependent on two primary components of the system: processing power and network interconnection. If allocating more powerful hardware theoretically increases performance, it increases the allocation cost on the other hand. Allocation exclusivity guarantees space for memory, storage, and CPU. This is not the case for the network interconnection since several si multaneous instances (multi-tenants) share the same communication channel, making the network a bottleneck. Therefore, this dissertation aims to analyze the impact of network interconnection on the execution of workloads from the HPC domain. We carried out two different assessments. The first concentrates on different network interconnections (GbE and InfiniBand) in the Microsoft Azure public cloud and costs related to their use. The second focuses on different network configurations using NIC aggregation methodolo gies in a private cloud-controlled environment. The results obtained showed that network interconnection is a crucial aspect and can significantly impact the performance of HPC applications executed in the cloud. In the Azure public cloud, the accelerated networking approach, which allows the instance to have a high-performance interconnection without additional charges, allows significant performance improvements for HPC applications with better cost efficiency. Finally, in the private cloud environment, the NIC aggre gation approach outperformed the baseline up to ≈98% of the executions with applica tions that make intensive use of the network. Also, Balance Round-Robin aggregation mode performed better than 802.3ad aggregation mode in the majority of the executions.A disponibilidade de recursos computacionais mudou significativamente devido ao uso do paradigma de computação em nuvem. Visando vantagens potenciais, como economia de custos por meio do método de pagamento por uso e alocação de recursos escalável/e lástica, testemunhamos esforços para executar aplicações de computação de alto desem penho (HPC) na nuvem. Devido à natureza distribuída desses ambientes, o desempenho é altamente dependente de dois componentes principais do sistema: potência de processa mento e interconexão de rede. Se a alocação de um hardware mais poderoso teoricamente aumenta o desempenho, ele aumenta o custo de alocação, por outro lado. A exclusividade de alocação garante espaço para memória, armazenamento e CPU. Este não é o caso da interconexão de rede, pois várias instâncias simultâneas (multilocatários) compartilham o mesmo canal de comunicação, tornando a rede um gargalo. Portanto, esta dissertação tem como objetivo analisar o impacto da interconexão de redes na execução de cargas de tra balho do domínio HPC. Realizamos duas avaliações diferentes. O primeiro concentra-se em diferentes interconexões de rede (GbE e InfiniBand) na nuvem pública da Microsoft Azure e nos custos relacionados ao seu uso. O segundo se concentra em diferentes confi gurações de rede usando metodologias de agregação de NICs em um ambiente controlado por nuvem privada. Os resultados obtidos mostraram que a interconexão de rede é um aspecto crucial e pode impactar significativamente no desempenho das aplicações HPC executados na nuvem. Na nuvem pública do Azure, a abordagem de rede acelerada, que permite que a instância tenha uma interconexão de alto desempenho sem encargos adici onais, permite melhorias significativas de desempenho para aplicações HPC com melhor custo-benefício. Finalmente, no ambiente de nuvem privada, a abordagem de agrega ção NIC superou a linha de base em até 98% das execuções com aplicações que fazem uso intensivo da rede. Além disso, o modo de agregação Balance Round-Robin teve um desempenho melhor do que o modo de agregação 802.3ad na maioria das execuções

    Resource orchestration strategies with retrials for latency-sensitive network slicing over distributed telco clouds

    Get PDF
    The new radio technologies (i.e. 5G and beyond) will allow a new generation of innovative services operated by vertical industries (e.g. robotic cloud, autonomous vehicles, etc.) with more stringent QoS requirements, especially in terms of end-to-end latency. Other technological changes, such as Network Function Virtualization (NFV) and Software-Defined Networking (SDN), will bring unique service capabilities to networks by enabling flexible network slicing that can be tailored to the needs of vertical services. However, effective orchestration strategies need to be put in place to offer latency minimization while also maximizing resource utilization for telco providers to address vertical requirements and increase their revenue. Looking at this objective, this paper addresses a latency-sensitive orchestration problem by proposing different strategies for the coordinated selection of virtual resources (network, computational, and storage resources) in distributed DCs while meeting vertical requirements (e.g., bandwidth demand) for network slicing. Three orchestration strategies are presented to minimize latency or the blocking probability through effective resource utilization. To further reduce the slice request blocking, orchestration strategies also encompass a retrial mechanism applied to rejected slice requests. Regarding latency, two components were considered, namely processing and network latency. An extensive set of simulations was carried out over a wide and composite telco cloud infrastructure in which different types of data centers coexist characterized by a different network location, size, and processing capacity. The results compare the behavior of the strategies in addressing latency minimization and service request fulfillment, also considering the impact of the retrial mechanism.This work was supported in part by the Department of Excellence in Robotics and Artificial Intelligence by Ministero dell’Istruzione, dell’Università e della Ricerca (MIUR) to Scuola Superiore Sant’Anna, and in part by the Project 5GROWTH under Agreement 856709
    corecore