52 research outputs found
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Vision Transformers (ViTs) have shown impressive performance and have become
a unified backbone for multiple vision tasks. But both attention and
multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense
multiplications, resulting in costly training and inference. To this end, we
propose to reparameterize the pre-trained ViT with a mixture of multiplication
primitives, e.g., bitwise shifts and additions, towards a new type of
multiplication-reduced model, dubbed , which aims for
end-to-end inference speedups on GPUs without the need of training from
scratch. Specifically, all among queries, keys, and values
are reparameterized by additive kernels, after mapping queries and keys to
binary codes in Hamming space. The remaining MLPs or linear layers are then
reparameterized by shift kernels. We utilize TVM to implement and optimize
those customized kernels for practical hardware deployment on GPUs. We find
that such a reparameterization on (quadratic or linear) attention maintains
model accuracy, while inevitably leading to accuracy drops when being applied
to MLPs. To marry the best of both worlds, we further propose a new mixture of
experts (MoE) framework to reparameterize MLPs by taking multiplication or its
primitives as experts, e.g., multiplication and shift, and designing a new
latency-aware load-balancing loss. Such a loss helps to train a generic router
for assigning a dynamic amount of input tokens to different experts according
to their latency. In principle, the faster experts run, the larger amount of
input tokens are assigned. Extensive experiments consistently validate the
effectiveness of our proposed ShiftAddViT, achieving up to
\textbf{5.18\times} latency reductions on GPUs and \textbf{42.9%} energy
savings, while maintaining comparable accuracy as original or efficient ViTs.Comment: Accepted by NeurIPS 202
Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction
Neural Radiance Field (NeRF) based 3D reconstruction is highly desirable for
immersive Augmented and Virtual Reality (AR/VR) applications, but achieving
instant (i.e., < 5 seconds) on-device NeRF training remains a challenge. In
this work, we first identify the inefficiency bottleneck: the need to
interpolate NeRF embeddings up to 200,000 times from a 3D embedding grid during
each training iteration. To alleviate this, we propose Instant-3D, an
algorithm-hardware co-design acceleration framework that achieves instant
on-device NeRF training. Our algorithm decomposes the embedding grid
representation in terms of color and density, enabling computational redundancy
to be squeezed out by adopting different (1) grid sizes and (2) update
frequencies for the color and density branches. Our hardware accelerator
further reduces the dominant memory accesses for embedding grid interpolation
by (1) mapping multiple nearby points' memory read requests into one during the
feed-forward process, (2) merging embedding grid updates from the same sliding
time window during back-propagation, and (3) fusing different computation cores
to support the different grid sizes needed by the color and density branches of
Instant-3D algorithm. Extensive experiments validate the effectiveness of
Instant-3D, achieving a large training time reduction of 41x - 248x while
maintaining the same reconstruction quality. Excitingly, Instant-3D has enabled
instant 3D reconstruction for AR/VR, requiring a reconstruction time of only
1.6 seconds per scene and meeting the AR/VR power consumption constraint of 1.9
W.Comment: Accepted by ISCA'2
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Vision Transformers (ViTs) have achieved state-of-the-art performance on
various vision tasks. However, ViTs' self-attention module is still arguably a
major bottleneck, limiting their achievable hardware efficiency. Meanwhile,
existing accelerators dedicated to NLP Transformers are not optimal for ViTs.
This is because there is a large difference between ViTs and NLP Transformers:
ViTs have a relatively fixed number of input tokens, whose attention maps can
be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers
need to handle input sequences of varying numbers of tokens and rely on
on-the-fly predictions of dynamic sparse attention patterns for each input to
achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated
algorithm and accelerator co-design framework dubbed ViTCoD for accelerating
ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the
attention maps to have either denser or sparser fixed patterns for regularizing
two levels of workloads without hurting the accuracy, largely reducing the
attention computations while leaving room for alleviating the remaining
dominant data movements; on top of that, we further integrate a lightweight and
learnable auto-encoder module to enable trading the dominant high-cost data
movements for lower-cost computations. On the hardware level, we develop a
dedicated accelerator to simultaneously coordinate the enforced denser/sparser
workloads and encoder/decoder engines for boosted hardware utilization.
Extensive experiments and ablation studies validate that ViTCoD largely reduces
the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x,
86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs,
and prior-art Transformer accelerators SpAtten and Sanger under an attention
sparsity of 90%, respectively.Comment: Accepted to HPCA 202
Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples
Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts
Sequential deprotonation of meso-(p-hydroxyphenyl)porphyrins in DMF: from hyperporphyrins to sodium porphyrin complexes
Sequential deprotonations of meso-(p-hydroxyphenyl)porphyrins (p-OHTPPH2) in DMF + H2O (V/V = 1:1) mixture have been verified to result in the appearance of hyperporphyrin spectra. However, when the deprotonations of these p-OHTPPH2 are carried out in DMF, the spectral changes differ considerably from those in the mixture mentioned above. At low [OH-], the optical spectra in the visible region are still considered to have characteristics of hyperporphyrin spectra. Further deprotonation at much higher basicity makes the optical spectra form three-banded spectra similar to those in the acidic solution. To clarify the molecular origins of these changes, UV-vis, resonance Raman (RR), proton nuclear magnetic resonance (H-1 NMR) experiments are carried out. Our data give evidence that p-OHTPPH2 in DMF can be further deprotonated of pyrrolic-H by higher concentrated NaOH, due to an aprotic medium like DMF effectively weakening the basicity of the porphyrin relative to that of the NaOH, and coordinates with two sodium ions (except the sodium ions that interact with the peripherial phenoxide anions) to form the sodium complexes of p-OHTPPH2 (Na2P, to lay a strong emphasis on the sodium ions that coordinate with the central nitrogen atom), which can be regarded as the porphyrin anions being perturbed by the sodium cations due to their highly ionic character
Author Correction: A retrospective clinical analysis of pediatric paragonimiasis in a Chinese children’s hospital from 2011 to 2019
An amendment to this paper has been published and can be accessed via a link at the top of the paper
A path analysis model suggesting the association of information and beliefs with self-efficacy in osteoporosis prevention among middle-aged and older community residents in urban Shanghai, China.
BACKGROUND:Osteoporosis is a chronic disease whose prevention is more effective than treatment, but it may be necessary to change people's self-efficacy to prevent this condition. This article aimed to study the pathway among information, beliefs and self-efficacy in osteoporosis prevention, and support further intervention. METHODS:A cross-sectional study was conducted among community residents over 40 years old from two volunteer communities in urban Shanghai, China. Of 450 middle-aged and older community residents who volunteered to participate in the study, 421 (93.5%) finished the field survey effectively. RESULTS:62.9% of the residents were females. Their mean age was 64.4 ± 11.2 years. The residents showed low knowledge of osteoporosis-related information, and the mean percentage of correct response was just 61.2%. In univariate analysis, information (univariate β = 0.27, 95% CI = 0.15-0.38) and beliefs (univariate β = 0.31, 95% CI = 0.25-0.38) were associated with self-efficacy. Multivariate analysis showed that information (multiple β = 0.19, 95% CI = 0.09-0.36) and belief (multiple β = 0.30, 95% CI = 0.23-0.36) remained significant. And in the path analysis, self-efficacy was significantly predicted by beliefs (β = 0.81, p<0.001). CONCLUSIONS:The study highlighted the urgency of conducting the osteoporosis preventive health promotion among middle-aged and older people as their lack of information and low level of beliefs and self-efficacy about osteoporosis prevention. Future interventions should focus on improving beliefs, especially perceived benefits, perceived threats, and action clues, on osteoporosis prevention in this group
- …