414 research outputs found
Successive Cancellation List Polar Decoder using Log-likelihood Ratios
Successive cancellation list (SCL) decoding algorithm is a powerful method
that can help polar codes achieve excellent error-correcting performance.
However, the current SCL algorithm and decoders are based on likelihood or
log-likelihood forms, which render high hardware complexity. In this paper, we
propose a log-likelihood-ratio (LLR)-based SCL (LLR-SCL) decoding algorithm,
which only needs half the computation and storage complexity than the
conventional one. Then, based on the proposed algorithm, we develop
low-complexity VLSI architectures for LLR-SCL decoders. Analysis results show
that the proposed LLR-SCL decoder achieves 50% reduction in hardware and 98%
improvement in hardware efficiency.Comment: accepted by 2014 Asilomar Conference on Signals, Systems, and
Computer
A Low-Latency FFT-IFFT Cascade Architecture
This paper addresses the design of a partly-parallel cascaded FFT-IFFT
architecture that does not require any intermediate buffer. Folding can be used
to design partly-parallel architectures for FFT and IFFT. While many cascaded
FFT-IFFT architectures can be designed using various folding sets for the FFT
and the IFFT, for a specified folded FFT architecture, there exists a unique
folding set to design the IFFT architecture that does not require an
intermediate buffer. Such a folding set is designed by processing the output of
the FFT as soon as possible (ASAP) in the folded IFFT. Elimination of the
intermediate buffer reduces latency and saves area. The proposed approach is
also extended to interleaved processing of multi-channel time-series. The
proposed FFT-IFFT cascade architecture saves about N/2 memory elements and N/4
clock cycles of latency compared to a design with identical folding sets. For
the 2-interleaved FFT-IFFT cascade, the memory and latency savings are,
respectively, N/2 units and N/2 clock cycles, compared to a design with
identical folding sets
A Gradient-Interleaved Scheduler for Energy-Efficient Backpropagation for Training Neural Networks
This paper addresses design of accelerators using systolic architectures for
training of neural networks using a novel gradient interleaving approach.
Training the neural network involves backpropagation of error and computation
of gradients with respect to the activation functions and weights. It is shown
that the gradient with respect to the activation function can be computed using
a weight-stationary systolic array while the gradient with respect to the
weights can be computed using an output-stationary systolic array. The novelty
of the proposed approach lies in interleaving the computations of these two
gradients to the same configurable systolic array. This results in reuse of the
variables from one computation to the other and eliminates unnecessary memory
accesses. The proposed approach leads to 1.4 - 2.2 times savings in terms of
number of cycles and savings in terms of memory accesses. Thus,
the proposed accelerator reduces latency and energy consumption.Comment: Proc. 2020 IEEE International Symposium on Circuits and Systems
(ISCAS
Tensor Decomposition for Model Reduction in Neural Networks: A Review
Modern neural networks have revolutionized the fields of computer vision (CV)
and Natural Language Processing (NLP). They are widely used for solving complex
CV tasks and NLP tasks such as image classification, image generation, and
machine translation. Most state-of-the-art neural networks are
over-parameterized and require a high computational cost. One straightforward
solution is to replace the layers of the networks with their low-rank tensor
approximations using different tensor decomposition methods. This paper reviews
six tensor decomposition methods and illustrates their ability to compress
model parameters of convolutional neural networks (CNNs), recurrent neural
networks (RNNs) and Transformers. The accuracy of some compressed models can be
higher than the original versions. Evaluations indicate that tensor
decompositions can achieve significant reductions in model size, run-time and
energy consumption, and are well suited for implementing neural networks on
edge devices.Comment: IEEE Circuits and Systems Magazine, 202
- …