48 research outputs found
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Pipeline parallelism enables efficient training of Large Language Models
(LLMs) on large-scale distributed accelerator clusters. Yet, pipeline bubbles
during startup and tear-down reduce the utilization of accelerators. Although
efficient pipeline schemes with micro-batching and bidirectional pipelines have
been proposed to maximize utilization, a significant number of bubbles cannot
be filled using synchronous forward and backward passes. To address this
problem, we suggest that extra work be assigned to the bubbles to gain
auxiliary benefits in LLM training. As an example in this direction, we propose
PipeFisher, which assigns the work of K-FAC, a second-order optimization method
based on the Fisher information matrix, to the bubbles to accelerate
convergence. In Phase 1 pretraining of BERT-Base and -Large models, PipeFisher
reduces the (simulated) training time to 50-75% compared to training with a
first-order optimizer by greatly improving the accelerator utilization and
benefiting from the improved convergence by K-FAC
Neural Graph Databases
Graph databases (GDBs) enable processing and analysis of unstructured,
complex, rich, and usually vast graph datasets. Despite the large significance
of GDBs in both academia and industry, little effort has been made into
integrating them with the predictive power of graph neural networks (GNNs). In
this work, we show how to seamlessly combine nearly any GNN model with the
computational capabilities of GDBs. For this, we observe that the majority of
these systems are based on, or support, a graph data model called the Labeled
Property Graph (LPG), where vertices and edges can have arbitrarily complex
sets of labels and properties. We then develop LPG2vec, an encoder that
transforms an arbitrary LPG dataset into a representation that can be directly
used with a broad class of GNNs, including convolutional, attentional,
message-passing, and even higher-order or spectral models. In our evaluation,
we show that the rich information represented as LPG labels and properties is
properly preserved by LPG2vec, and it increases the accuracy of predictions
regardless of the targeted learning task or the used GNN model, by up to 34%
compared to graphs with no LPG labels/properties. In general, LPG2vec enables
combining predictive power of the most powerful GNNs with the full scope of
information encoded in the LPG model, paving the way for neural graph
databases, a class of systems where the vast complexity of maintained data will
benefit from modern and future graph machine learning methods
The whole blood transcriptional regulation landscape in 465 COVID-19 infected samples from Japan COVID-19 Task Force
「コロナ制圧タスクフォース」COVID-19患者由来の血液細胞における遺伝子発現の網羅的解析 --重症度に応じた遺伝子発現の変化には、ヒトゲノム配列の個人差が影響する--. 京都大学プレスリリース. 2022-08-23.Coronavirus disease 2019 (COVID-19) is a recently-emerged infectious disease that has caused millions of deaths, where comprehensive understanding of disease mechanisms is still unestablished. In particular, studies of gene expression dynamics and regulation landscape in COVID-19 infected individuals are limited. Here, we report on a thorough analysis of whole blood RNA-seq data from 465 genotyped samples from the Japan COVID-19 Task Force, including 359 severe and 106 non-severe COVID-19 cases. We discover 1169 putative causal expression quantitative trait loci (eQTLs) including 34 possible colocalizations with biobank fine-mapping results of hematopoietic traits in a Japanese population, 1549 putative causal splice QTLs (sQTLs; e.g. two independent sQTLs at TOR1AIP1), as well as biologically interpretable trans-eQTL examples (e.g., REST and STING1), all fine-mapped at single variant resolution. We perform differential gene expression analysis to elucidate 198 genes with increased expression in severe COVID-19 cases and enriched for innate immune-related functions. Finally, we evaluate the limited but non-zero effect of COVID-19 phenotype on eQTL discovery, and highlight the presence of COVID-19 severity-interaction eQTLs (ieQTLs; e.g., CLEC4C and MYBL2). Our study provides a comprehensive catalog of whole blood regulatory variants in Japanese, as well as a reference for transcriptional landscapes in response to COVID-19 infection
DOCK2 is involved in the host genetics and biology of severe COVID-19
「コロナ制圧タスクフォース」COVID-19疾患感受性遺伝子DOCK2の重症化機序を解明 --アジア最大のバイオレポジトリーでCOVID-19の治療標的を発見--. 京都大学プレスリリース. 2022-08-10.Identifying the host genetic factors underlying severe COVID-19 is an emerging challenge. Here we conducted a genome-wide association study (GWAS) involving 2, 393 cases of COVID-19 in a cohort of Japanese individuals collected during the initial waves of the pandemic, with 3, 289 unaffected controls. We identified a variant on chromosome 5 at 5q35 (rs60200309-A), close to the dedicator of cytokinesis 2 gene (DOCK2), which was associated with severe COVID-19 in patients less than 65 years of age. This risk allele was prevalent in East Asian individuals but rare in Europeans, highlighting the value of genome-wide association studies in non-European populations. RNA-sequencing analysis of 473 bulk peripheral blood samples identified decreased expression of DOCK2 associated with the risk allele in these younger patients. DOCK2 expression was suppressed in patients with severe cases of COVID-19. Single-cell RNA-sequencing analysis (n = 61 individuals) identified cell-type-specific downregulation of DOCK2 and a COVID-19-specific decreasing effect of the risk allele on DOCK2 expression in non-classical monocytes. Immunohistochemistry of lung specimens from patients with severe COVID-19 pneumonia showed suppressed DOCK2 expression. Moreover, inhibition of DOCK2 function with CPYPP increased the severity of pneumonia in a Syrian hamster model of SARS-CoV-2 infection, characterized by weight loss, lung oedema, enhanced viral loads, impaired macrophage recruitment and dysregulated type I interferon responses. We conclude that DOCK2 has an important role in the host immune response to SARS-CoV-2 infection and the development of severe COVID-19, and could be further explored as a potential biomarker and/or therapeutic target
Efficient Quantized Sparse Matrix Operations on Tensor Cores
The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model sparsification and quantization have been studied to alleviate the problem. From the architecture perspective, hardware vendors provide Tensor cores for acceleration. However, it is very challenging to gain practical speedups from sparse, low-precision matrix operations on Tensor cores, because of the strict requirements for data layout and lack of support for efficiently manipulating the low-precision integers. We propose Magicube, a high-performance sparse-matrix library for low-precision integers on Tensor cores. Magicube supports SpMM and SDDMM, two major sparse operations in deep learning with mixed precision. Experimental results on an NVIDIA A100 GPU show that Magicube achieves on average 1.44x (up to 2.37x) speedup over the vendor-optimized library for sparse kernels, and 1.43x speedup over the state-of-the-art with a comparable accuracy for end-to-end sparse Transformer inference
Efficient Quantized Sparse Matrix Operations on Tensor Cores
The exponentially growing model size drives the continued success of deep
learning, but it brings prohibitive computation and memory cost. From the
algorithm perspective, model sparsification and quantization have been studied
to alleviate the problem. From the architecture perspective, hardware vendors
provide Tensor cores for acceleration. However, it is very challenging to gain
practical speedups from sparse, low-precision matrix operations on Tensor
cores, because of the strict requirements for data layout and lack of support
for efficiently manipulating the low-precision integers. We propose Magicube, a
high-performance sparse-matrix library for low-precision integers on Tensor
cores. Magicube supports SpMM and SDDMM, two major sparse operations in deep
learning with mixed precision. Experimental results on an NVIDIA A100 GPU show
that Magicube achieves on average 1.44x (up to 2.37x) speedup over the
vendor-optimized library for sparse kernels, and 1.43x speedup over the
state-of-the-art with a comparable accuracy for end-to-end sparse Transformer
inference.Comment: Published in Proceedings of 2022 International Conference for High
Performance Computing, Networking, Storage and Analysis (SC'22), No.: 37,
Pages 1-15, Best Paper Finalist,
https://dl.acm.org/doi/10.5555/3571885.3571934 (In this arXiv verion, we fix
a typo at the bottom right of Page 6: For SDDMM, each thread block needs
steps to obtain the final results; we fix Table 3.