52 research outputs found

    ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

    Full text link
    Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. But both attention and multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense multiplications, resulting in costly training and inference. To this end, we propose to reparameterize the pre-trained ViT with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed ShiftAddViT\textbf{ShiftAddViT}, which aims for end-to-end inference speedups on GPUs without the need of training from scratch. Specifically, all MatMuls\texttt{MatMuls} among queries, keys, and values are reparameterized by additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized by shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on (quadratic or linear) attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. In principle, the faster experts run, the larger amount of input tokens are assigned. Extensive experiments consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to \textbf{5.18\times} latency reductions on GPUs and \textbf{42.9%} energy savings, while maintaining comparable accuracy as original or efficient ViTs.Comment: Accepted by NeurIPS 202

    Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction

    Full text link
    Neural Radiance Field (NeRF) based 3D reconstruction is highly desirable for immersive Augmented and Virtual Reality (AR/VR) applications, but achieving instant (i.e., < 5 seconds) on-device NeRF training remains a challenge. In this work, we first identify the inefficiency bottleneck: the need to interpolate NeRF embeddings up to 200,000 times from a 3D embedding grid during each training iteration. To alleviate this, we propose Instant-3D, an algorithm-hardware co-design acceleration framework that achieves instant on-device NeRF training. Our algorithm decomposes the embedding grid representation in terms of color and density, enabling computational redundancy to be squeezed out by adopting different (1) grid sizes and (2) update frequencies for the color and density branches. Our hardware accelerator further reduces the dominant memory accesses for embedding grid interpolation by (1) mapping multiple nearby points' memory read requests into one during the feed-forward process, (2) merging embedding grid updates from the same sliding time window during back-propagation, and (3) fusing different computation cores to support the different grid sizes needed by the color and density branches of Instant-3D algorithm. Extensive experiments validate the effectiveness of Instant-3D, achieving a large training time reduction of 41x - 248x while maintaining the same reconstruction quality. Excitingly, Instant-3D has enabled instant 3D reconstruction for AR/VR, requiring a reconstruction time of only 1.6 seconds per scene and meeting the AR/VR power consumption constraint of 1.9 W.Comment: Accepted by ISCA'2

    ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

    Full text link
    Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their achievable hardware efficiency. Meanwhile, existing accelerators dedicated to NLP Transformers are not optimal for ViTs. This is because there is a large difference between ViTs and NLP Transformers: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations. On the hardware level, we develop a dedicated accelerator to simultaneously coordinate the enforced denser/sparser workloads and encoder/decoder engines for boosted hardware utilization. Extensive experiments and ablation studies validate that ViTCoD largely reduces the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x, 86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs, and prior-art Transformer accelerators SpAtten and Sanger under an attention sparsity of 90%, respectively.Comment: Accepted to HPCA 202

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    >

    No full text

    Sequential deprotonation of meso-(p-hydroxyphenyl)porphyrins in DMF: from hyperporphyrins to sodium porphyrin complexes

    No full text
    Sequential deprotonations of meso-(p-hydroxyphenyl)porphyrins (p-OHTPPH2) in DMF + H2O (V/V = 1:1) mixture have been verified to result in the appearance of hyperporphyrin spectra. However, when the deprotonations of these p-OHTPPH2 are carried out in DMF, the spectral changes differ considerably from those in the mixture mentioned above. At low [OH-], the optical spectra in the visible region are still considered to have characteristics of hyperporphyrin spectra. Further deprotonation at much higher basicity makes the optical spectra form three-banded spectra similar to those in the acidic solution. To clarify the molecular origins of these changes, UV-vis, resonance Raman (RR), proton nuclear magnetic resonance (H-1 NMR) experiments are carried out. Our data give evidence that p-OHTPPH2 in DMF can be further deprotonated of pyrrolic-H by higher concentrated NaOH, due to an aprotic medium like DMF effectively weakening the basicity of the porphyrin relative to that of the NaOH, and coordinates with two sodium ions (except the sodium ions that interact with the peripherial phenoxide anions) to form the sodium complexes of p-OHTPPH2 (Na2P, to lay a strong emphasis on the sodium ions that coordinate with the central nitrogen atom), which can be regarded as the porphyrin anions being perturbed by the sodium cations due to their highly ionic character

    Author Correction: A retrospective clinical analysis of pediatric paragonimiasis in a Chinese children’s hospital from 2011 to 2019

    No full text
    An amendment to this paper has been published and can be accessed via a link at the top of the paper

    A path analysis model suggesting the association of information and beliefs with self-efficacy in osteoporosis prevention among middle-aged and older community residents in urban Shanghai, China.

    No full text
    BACKGROUND:Osteoporosis is a chronic disease whose prevention is more effective than treatment, but it may be necessary to change people's self-efficacy to prevent this condition. This article aimed to study the pathway among information, beliefs and self-efficacy in osteoporosis prevention, and support further intervention. METHODS:A cross-sectional study was conducted among community residents over 40 years old from two volunteer communities in urban Shanghai, China. Of 450 middle-aged and older community residents who volunteered to participate in the study, 421 (93.5%) finished the field survey effectively. RESULTS:62.9% of the residents were females. Their mean age was 64.4 ± 11.2 years. The residents showed low knowledge of osteoporosis-related information, and the mean percentage of correct response was just 61.2%. In univariate analysis, information (univariate β = 0.27, 95% CI = 0.15-0.38) and beliefs (univariate β = 0.31, 95% CI = 0.25-0.38) were associated with self-efficacy. Multivariate analysis showed that information (multiple β = 0.19, 95% CI = 0.09-0.36) and belief (multiple β = 0.30, 95% CI = 0.23-0.36) remained significant. And in the path analysis, self-efficacy was significantly predicted by beliefs (β = 0.81, p<0.001). CONCLUSIONS:The study highlighted the urgency of conducting the osteoporosis preventive health promotion among middle-aged and older people as their lack of information and low level of beliefs and self-efficacy about osteoporosis prevention. Future interventions should focus on improving beliefs, especially perceived benefits, perceived threats, and action clues, on osteoporosis prevention in this group
    corecore