3,052 research outputs found

    Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning

    Full text link
    This paper considers the problem of resource allocation in stream processing, where continuous data flows must be processed in real time in a large distributed system. To maximize system throughput, the resource allocation strategy that partitions the computation tasks of a stream processing graph onto computing devices must simultaneously balance workload distribution and minimize communication. Since this problem of graph partitioning is known to be NP-complete yet crucial to practical streaming systems, many heuristic-based algorithms have been developed to find reasonably good solutions. In this paper, we present a graph-aware encoder-decoder framework to learn a generalizable resource allocation strategy that can properly distribute computation tasks of stream processing graphs unobserved from training data. We, for the first time, propose to leverage graph embedding to learn the structural information of the stream processing graphs. Jointly trained with the graph-aware decoder using deep reinforcement learning, our approach can effectively find optimized solutions for unseen graphs. Our experiments show that the proposed model outperforms both METIS, a state-of-the-art graph partitioning algorithm, and an LSTM-based encoder-decoder model, in about 70% of the test cases.Comment: Accepted by AAAI 202

    MSRL: Distributed Reinforcement Learning with Dataflow Fragments

    Get PDF
    Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e.g. policy network updates) on GPU workers. Fundamentally, current systems lack abstractions that decouple RL algorithms from their execution. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps Python functions from an RL algorithm's training loop to parallel computational fragments. Fragments are executed on different devices by translating them to low-level dataflow representations, e.g. computational graphs as supported by deep learning engines, CUDA implementations or multi-threaded CPU processes. We show that MSRL subsumes the distribution strategies of existing systems, while scaling RL training to 64 GPUs

    Optimisation for Optical Data Centre Switching and Networking with Artificial Intelligence

    Get PDF
    Cloud and cluster computing platforms have become standard across almost every domain of business, and their scale quickly approaches O(106)\mathbf{O}(10^6) servers in a single warehouse. However, the tier-based opto-electronically packet switched network infrastructure that is standard across these systems gives way to several scalability bottlenecks including resource fragmentation and high energy requirements. Experimental results show that optical circuit switched networks pose a promising alternative that could avoid these. However, optimality challenges are encountered at realistic commercial scales. Where exhaustive optimisation techniques are not applicable for problems at the scale of Cloud-scale computer networks, and expert-designed heuristics are performance-limited and typically biased in their design, artificial intelligence can discover more scalable and better performing optimisation strategies. This thesis demonstrates these benefits through experimental and theoretical work spanning all of component, system and commercial optimisation problems which stand in the way of practical Cloud-scale computer network systems. Firstly, optical components are optimised to gate in ≈500ps\approx 500 ps and are demonstrated in a proof-of-concept switching architecture for optical data centres with better wavelength and component scalability than previous demonstrations. Secondly, network-aware resource allocation schemes for optically composable data centres are learnt end-to-end with deep reinforcement learning and graph neural networks, where 3×3\times less networking resources are required to achieve the same resource efficiency compared to conventional methods. Finally, a deep reinforcement learning based method for optimising PID-control parameters is presented which generates tailored parameters for unseen devices in O(10−3)s\mathbf{O}(10^{-3}) s. This method is demonstrated on a market leading optical switching product based on piezoelectric actuation, where switching speed is improved >20%>20\% with no compromise to optical loss and the manufacturing yield of actuators is improved. This method was licensed to and integrated within the manufacturing pipeline of this company. As such, crucial public and private infrastructure utilising these products will benefit from this work

    Towards Scalable Circuit Partitioning for Multi-Core Quantum Architectures with Deep Reinforcement Learning

    Get PDF
    La computación cuántica tiene un inmenso potencial para resolver problemas clásicamente intratables aprovechando las propiedades únicas de los cúbits. Sin embargo, la escalabilidad de las arquitecturas cuánticas sigue siendo un desafío significativo. Para abordar este problema, se proponen arquitecturas cuánticas de múltiples núcleos. No obstante, la realización de dichas arquitecturas plantea múltiples desafíos en hardware, algoritmos y la interfaz entre ellos. En particular, uno de estos desafíos es cómo particionar de manera óptima los algoritmos para que se ajusten dentro de los múltiples núcleos. Esta tesis presenta un enfoque novedoso para la partición escalable de circuitos en arquitecturas cuánticas de múltiples núcleos utilizando Aprendizaje Profundo Reforzado. El objetivo es superar a los algoritmos metaheurísticos existentes, como el algoritmo de particionamiento de FGP-rOEE, en términos de precisión y escalabilidad. Esta investigación contribuye al avance tanto de la computación cuántica como de las técnicas de particionamiento de gráficos, ofreciendo nuevos conocimientos sobre la optimización de los sistemas cuánticos. Al abordar los desafíos asociados con la escalabilidad de las computadoras cuánticas, abrimos el camino para su implementación práctica en la resolución de problemas computacionalmente desafiantes.Quantum computing holds immense potential for solving classically intractable problems by leveraging the unique properties of qubits. However, the scalability of quantum architectures remains a significant challenge. To address this issue, multi-core quantum architectures are proposed. Yet, the realization of such multi-core architectures poses multiple challenges in hardware, algorithms, and the interface between them. In particular, one of these challenges is how to optimally partition the algorithms to fit within the cores of a multi-core quantum computer. This thesis presents a novel approach for scalable circuit partitioning on multi-core quantum architectures using Deep Reinforcement Learning. The objective is to surpass existing meta-heuristic algorithms, such as FGP-rOEE's partitioning algorithm, in terms of accuracy and scalability. This research contributes to the advancement of both quantum computing and graph partitioning techniques, offering new insights into the optimization of quantum systems. By addressing the challenges associated with scaling quantum computers, we pave the way for their practical implementation in solving computationally challenging problems
    • …
    corecore