6 research outputs found

    HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading)

    Full text link
    Full-graph training on graph neural networks (GNN) has emerged as a promising training method for its effectiveness. Full-graph training requires extensive memory and computation resources. To accelerate this training process, researchers have proposed employing multi-GPU processing. However the scalability of existing frameworks is limited as they necessitate maintaining the training data for every layer in GPU memory. To efficiently train on large graphs, we present HongTu, a scalable full-graph GNN training system running on GPU-accelerated platforms. HongTu stores vertex data in CPU memory and offloads training to GPUs. HongTu employs a memory-efficient full-graph training framework that reduces runtime memory consumption by using partition-based training and recomputation-caching-hybrid intermediate data management. To address the issue of increased host-GPU communication caused by duplicated neighbor access among partitions, HongTu employs a deduplicated communication framework that converts the redundant host-GPU communication to efficient inter/intra-GPU data access. Further, HongTu uses a cost model-guided graph reorganization method to minimize communication overhead. Experimental results on a 4XA100 GPU server show that HongTu effectively supports billion-scale full-graph GNN training while reducing host-GPU data communication by 25%-71%. Compared to the full-graph GNN system DistGNN running on 16 CPU nodes, HongTu achieves speedups ranging from 7.8X to 20.2X. For small graphs where the training data fits into the GPUs, HongTu achieves performance comparable to existing GPU-based GNN systems.Comment: 28 pages 11 figures, SIGMOD202

    NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

    Full text link
    Graph Neural Networks (GNNs) have demonstrated outstanding performance in various applications. Existing frameworks utilize CPU-GPU heterogeneous environments to train GNN models and integrate mini-batch and sampling techniques to overcome the GPU memory limitation. In CPU-GPU heterogeneous environments, we can divide sample-based GNN training into three steps: sample, gather, and train. Existing GNN systems use different task orchestrating methods to employ each step on CPU or GPU. After extensive experiments and analysis, we find that existing task orchestrating methods fail to fully utilize the heterogeneous resources, limited by inefficient CPU processing or GPU resource contention. In this paper, we propose NeutronOrch, a system for sample-based GNN training that incorporates a layer-based task orchestrating method and ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training. To avoid inefficient CPU processing, NeutronOrch only offloads the training of frequently accessed vertices to the CPU and lets GPU reuse their embeddings with bounded staleness. Furthermore, NeutronOrch provides a fine-grained pipeline design for the layer-based task orchestrating method, fully overlapping different tasks on heterogeneous resources while strictly guaranteeing bounded staleness. The experimental results show that compared with the state-of-the-art GNN systems, NeutronOrch can achieve up to 11.51x performance speedup

    NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams

    Full text link
    Existing Graph Neural Network (GNN) training frameworks have been designed to help developers easily create performant GNN implementations. However, most existing GNN frameworks assume that the input graphs are static, but ignore that most real-world graphs are constantly evolving. Though many dynamic GNN models have emerged to learn from evolving graphs, the training process of these dynamic GNNs is dramatically different from traditional GNNs in that it captures both the spatial and temporal dependencies of graph updates. This poses new challenges for designing dynamic GNN training frameworks. First, the traditional batched training method fails to capture real-time structural evolution information. Second, the time-dependent nature makes parallel training hard to design. Third, it lacks system supports for users to efficiently implement dynamic GNNs. In this paper, we present NeutronStream, a framework for training dynamic GNN models. NeutronStream abstracts the input dynamic graph into a chronologically updated stream of events and processes the stream with an optimized sliding window to incrementally capture the spatial-temporal dependencies of events. Furthermore, NeutronStream provides a parallel execution engine to tackle the sequential event processing challenge to achieve high performance. NeutronStream also integrates a built-in graph storage structure that supports dynamic updates and provides a set of easy-to-use APIs that allow users to express their dynamic GNNs. Our experimental results demonstrate that, compared to state-of-the-art dynamic GNN implementations, NeutronStream achieves speedups ranging from 1.48X to 5.87X and an average accuracy improvement of 3.97%.Comment: 12 pages, 15 figure

    Structure evolution and performance of poly (vinyl alcohol) fibers with controllable cross-section fabricated using a combination of melt-spinning and stretching

    No full text
    Profiled fibers are gained potential applications in the fields of both functional textiles and cements reinforcements. While the commercial Poly (vinyl alcohol) (PVA) fibers are generally produced by solution spinning, which is difficult to prepare profiled PVA fiber with controllable cross-section because of bidirectional mass transfer in coagulation bath. In this work, based on intermolecular complexation and plasticization, water was adopted to improve the thermal-processing of PVA, the special-shaped PVA fibers such as triangular, trilobal, cruciform and wavy cross-sections were prepared by melt-spinning. The crystallinity of profiled as-spun PVA fibers was lower than 32%, and their elongation at break was higher than 400%, resulting in good stretch-ability for hot-drawing. During the hot-drawing of profiled PVA fibers, the circular trend of the characteristic sections can be controlled, and the reduction in profile degree was lower than 9.4%. Hot-drawing improved the crystalline structure of PVA, and increased the crystallinity, orientation, tensile strength, and melting point of profiled fibers. When the draw ratio was 9, the tensile strength of triangular, cruciform, trilobal and wavy fibers were 585 MPa, 559 MPa, 481 MPa and 585 MPa, respectively, which proposed a new processing technology of special-shaped PVA fiber and enriched the kinds of special-shaped fibers

    Lipid Metabolism-Related Gene Signature Predicts Prognosis and Indicates Immune Microenvironment Infiltration in Advanced Gastric Cancer

    No full text
    Objective. Abnormal lipid metabolism is known to influence the malignant behavior of gastric cancer. However, the underlying mechanism remains elusive. In this study, we comprehensively analyzed the biological significance of genes involved in lipid metabolism in advanced gastric cancer (AGC). Methods. We obtained gene expression profiles from The Cancer Genome Atlas (TCGA) database for early and advanced gastric cancer samples and performed differential expression analysis to identify specific lipid metabolism-related genes in AGC. We then used consensus cluster analysis to classify AGC patients into molecular subtypes based on lipid metabolism and constructed a diagnostic model using least absolute shrinkage and selection operator- (LASSO-) Cox regression analysis and Gene Set Enrichment Analysis (GSEA). We evaluated the discriminative ability and clinical significance of the model using the Kaplan-Meier (KM) curve, ROC curve, DCA curve, and nomogram. We also estimated immune levels based on immune microenvironment expression, immune checkpoints, and immune cell infiltration and obtained hub genes by weighted gene co-expression network analysis (WGCNA) of differential genes from the two molecular subtypes. Results. We identified 6 lipid metabolism genes that were associated with the prognosis of AGC and used consistent clustering to classify AGC patients into two subgroups with significantly different overall survival and immune microenvironment. Our risk model successfully classified patients in the training and validation sets into high-risk and low-risk groups. The high-risk score predicted poor prognosis and indicated low degree of immune infiltration. Subgroup analysis showed that the risk model was an independent predictor of prognosis in AGC. Furthermore, our results indicated that most chemotherapeutic agents are more effective for AGC patients in the low-risk group than in the high-risk group, and risk scores for AGC are strongly correlated with drug sensitivity. Finally, we performed qRT-PCR experiments to verify the relevant results. Conclusion. Our findings suggest that lipid metabolism-related genes play an important role in predicting the prognosis of AGC and regulating immune invasion. These results have important implications for the development of targeted therapies for AGC patients
    corecore