1,763 research outputs found
Computational linear algebra over finite fields
We present here algorithms for efficient computation of linear algebra
problems over finite fields
Synthesis and Optimization of Reversible Circuits - A Survey
Reversible logic circuits have been historically motivated by theoretical
research in low-power electronics as well as practical improvement of
bit-manipulation transforms in cryptography and computer graphics. Recently,
reversible circuits have attracted interest as components of quantum
algorithms, as well as in photonic and nano-computing technologies where some
switching devices offer no signal gain. Research in generating reversible logic
distinguishes between circuit synthesis, post-synthesis optimization, and
technology mapping. In this survey, we review algorithmic paradigms ---
search-based, cycle-based, transformation-based, and BDD-based --- as well as
specific algorithms for reversible synthesis, both exact and heuristic. We
conclude the survey by outlining key open challenges in synthesis of reversible
and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Vision Transformers (ViTs) have achieved state-of-the-art performance on
various vision tasks. However, ViTs' self-attention module is still arguably a
major bottleneck, limiting their achievable hardware efficiency. Meanwhile,
existing accelerators dedicated to NLP Transformers are not optimal for ViTs.
This is because there is a large difference between ViTs and NLP Transformers:
ViTs have a relatively fixed number of input tokens, whose attention maps can
be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers
need to handle input sequences of varying numbers of tokens and rely on
on-the-fly predictions of dynamic sparse attention patterns for each input to
achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated
algorithm and accelerator co-design framework dubbed ViTCoD for accelerating
ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the
attention maps to have either denser or sparser fixed patterns for regularizing
two levels of workloads without hurting the accuracy, largely reducing the
attention computations while leaving room for alleviating the remaining
dominant data movements; on top of that, we further integrate a lightweight and
learnable auto-encoder module to enable trading the dominant high-cost data
movements for lower-cost computations. On the hardware level, we develop a
dedicated accelerator to simultaneously coordinate the enforced denser/sparser
workloads and encoder/decoder engines for boosted hardware utilization.
Extensive experiments and ablation studies validate that ViTCoD largely reduces
the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x,
86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs,
and prior-art Transformer accelerators SpAtten and Sanger under an attention
sparsity of 90%, respectively.Comment: Accepted to HPCA 202
Analysis and Synthesis of Digital Dyadic Sequences
We explore the space of matrix-generated (0, m, 2)-nets and (0, 2)-sequences
in base 2, also known as digital dyadic nets and sequences. In computer
graphics, they are arguably leading the competition for use in rendering. We
provide a complete characterization of the design space and count the possible
number of constructions with and without considering possible reorderings of
the point set. Based on this analysis, we then show that every digital dyadic
net can be reordered into a sequence, together with a corresponding algorithm.
Finally, we present a novel family of self-similar digital dyadic sequences, to
be named -sequences, that spans a subspace with fewer degrees of freedom.
Those -sequences are extremely efficient to sample and compute, and we
demonstrate their advantages over the classic Sobol (0, 2)-sequence.Comment: 17 pages, 11 figures. Minor improvement of exposition; references to
earlier proofs of Theorems 3.1 and 3.3 adde
- …