74 research outputs found

    Centralized and Distributed Sparsification for Low-Complexity Message Passing Algorithm in C-RAN Architectures

    Full text link
    Cloud radio access network (C-RAN) is a promising technology for fifth-generation (5G) cellular systems. However the burden imposed by the huge amount of data to be collected (in the uplink) from the radio remote heads (RRHs) and processed at the base band unit (BBU) poses serious challenges. In order to reduce the computation effort of minimum mean square error (MMSE) receiver at the BBU the Gaussian message passing (MP) together with a suitable sparsification of the channel matrix can be used. In this paper we propose two sets of solutions, either centralized or distributed ones. In the centralized solutions, we propose different approaches to sparsify the channel matrix, in order to reduce the complexity of MP. However these approaches still require that all signals reaching the RRH are conveyed to the BBU, therefore the communication requirements among the backbone network devices are unaltered. In the decentralized solutions instead we aim at reducing both the complexity of MP at the BBU and the requirements on the RRHs-BBU communication links by pre-processing the signals at the RRH and convey a reduced set of signals to the BBU.Comment: Accepted for pubblication in IEEE VTC 201

    Message passing algorithms for cloud radio access network of cellular systems

    Get PDF
    Cloud Radio Acces Network è un'architettura per reti cellulari che risulta promottente per lo standard 5G. Il problema principale della decodifica tramite algoritmi a passaggio di messaggio risiede nella complessità computazionale. Proponiamo quindi due diversi approcci per la risoluzione del problema: uno centralizzato per la sparsificazione del canale e uno distribuito in cui la riduzione della complessità è implementata tramite pre-codifica alle celle

    Linear state estimation via 5G C-RAN cellular networks using Gaussian belief propagation

    Get PDF
    Machine-type communications and large-scale information processing architectures are among key (r)evolutionary enhancements of emerging fifth-generation (5G) mobile cellular networks. Massive data acquisition and processing will make 5G network an ideal platform for large-scale system monitoring and control with applications in future smart transportation, connected industry, power grids, etc. In this work, we investigate a capability of such a 5G network architecture to provide the state estimate of an underlying linear system from the input obtained via large-scale deployment of measurement devices. Assuming that the measurements are communicated via densely deployed cloud radio access network (C-RAN), we formulate and solve the problem of estimating the system state from the set of signals collected at C-RAN base stations. Our solution, based on the Gaussian Belief-Propagation (GBP) framework, allows for large-scale and distributed deployment within the emerging 5G information processing architectures. The presented numerical study demonstrates the accuracy, convergence behavior and scalability of the proposed GBP-based solution to the large-scale state estimation problem

    FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

    Full text link
    The rapid growth of memory and computation requirements of large language models (LLMs) has outpaced the development of hardware, hindering people who lack large-scale high-end GPUs from training or deploying LLMs. However, consumer-level GPUs, which constitute a larger market share, are typically overlooked in LLM due to their weaker computing performance, smaller storage capacity, and lower communication bandwidth. Additionally, users may have privacy concerns when interacting with remote LLMs. In this paper, we envision a decentralized system unlocking the potential vast untapped consumer-level GPUs in pre-training, inference and fine-tuning of LLMs with privacy protection. However, this system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity. To address these challenges, our system design incorporates: 1) a broker with backup pool to implement dynamic join and quit of computing providers; 2) task scheduling with hardware performance to improve system efficiency; 3) abstracting ML procedures into directed acyclic graphs (DAGs) to achieve model and task universality; 4) abstracting intermediate represention and execution planes to ensure compatibility of various devices and deep learning (DL) frameworks. Our performance analysis demonstrates that 50 RTX 3080 GPUs can achieve throughputs comparable to those of 4 H100 GPUs, which are significantly more expensive

    Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

    Full text link
    Distributed deep learning becomes very common to reduce the overall training time by exploiting multiple computing devices (e.g., GPUs/TPUs) as the size of deep models and data sets increases. However, data communication between computing devices could be a potential bottleneck to limit the system scalability. How to address the communication problem in distributed deep learning is becoming a hot research topic recently. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations. In the system-level, we demystify the system design and implementation to reduce the communication cost. In algorithmic-level, we compare different algorithms with theoretical convergence bounds and communication complexity. Specifically, we first propose the taxonomy of data-parallel distributed training algorithms, which contains four main dimensions: communication synchronization, system architectures, compression techniques, and parallelism of communication and computing. Then we discuss the studies in addressing the problems of the four dimensions to compare the communication cost. We further compare the convergence rates of different algorithms, which enable us to know how fast the algorithms can converge to the solution in terms of iterations. According to the system-level communication cost analysis and theoretical convergence speed comparison, we provide the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations

    Distributed Symmetry Breaking on Power Graphs via Sparsification

    Full text link
    In this paper, we present efficient distributed algorithms for classical symmetry breaking problems, maximal independent sets (MIS) and ruling sets, in power graphs. We work in the standard CONGEST model of distributed message passing, where the communication network is abstracted as a graph GG. Typically, the problem instance in CONGEST is identical to the communication network GG, that is, we perform the symmetry breaking in GG. In this work, we consider a setting where the problem instance corresponds to a power graph GkG^k, where each node of the communication network GG is connected to all of its kk-hop neighbors. Our main contribution is a deterministic polylogarithmic time algorithm for computing kk-ruling sets of GkG^k, which (for k>1k>1) improves exponentially on the current state-of-the-art runtimes. The main technical ingredient for this result is a deterministic sparsification procedure which may be of independent interest. On top of being a natural family of problems, ruling sets (in power graphs) are well-motivated through their applications in the powerful shattering framework [BEPS JACM'16, Ghaffari SODA'19] (and others). We present randomized algorithms for computing maximal independent sets and ruling sets of GkG^k in essentially the same time as they can be computed in GG. We also revisit the shattering algorithm for MIS [BEPS JACM'16] and present different approaches for the post-shattering phase. Our solutions are algorithmically and analytically simpler (also in the LOCAL model) than existing solutions and obtain the same runtime as [Ghaffari SODA'16]
    corecore