Search CORE

414 research outputs found

Successive Cancellation List Polar Decoder using Log-likelihood Ratios

Author: Parhi Keshab K.
Yuan Bo
Publication venue
Publication date: 12/12/2014
Field of study

Successive cancellation list (SCL) decoding algorithm is a powerful method that can help polar codes achieve excellent error-correcting performance. However, the current SCL algorithm and decoders are based on likelihood or log-likelihood forms, which render high hardware complexity. In this paper, we propose a log-likelihood-ratio (LLR)-based SCL (LLR-SCL) decoding algorithm, which only needs half the computation and storage complexity than the conventional one. Then, based on the proposed algorithm, we develop low-complexity VLSI architectures for LLR-SCL decoders. Analysis results show that the proposed LLR-SCL decoder achieves 50% reduction in hardware and 98% improvement in hardware efficiency.Comment: accepted by 2014 Asilomar Conference on Signals, Systems, and Computer

arXiv.org e-Print Archive

Crossref

A Low-Latency FFT-IFFT Cascade Architecture

Author: Parhi Keshab K.
Publication venue
Publication date: 16/09/2023
Field of study

This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. Folding can be used to design partly-parallel architectures for FFT and IFFT. While many cascaded FFT-IFFT architectures can be designed using various folding sets for the FFT and the IFFT, for a specified folded FFT architecture, there exists a unique folding set to design the IFFT architecture that does not require an intermediate buffer. Such a folding set is designed by processing the output of the FFT as soon as possible (ASAP) in the folded IFFT. Elimination of the intermediate buffer reduces latency and saves area. The proposed approach is also extended to interleaved processing of multi-channel time-series. The proposed FFT-IFFT cascade architecture saves about N/2 memory elements and N/4 clock cycles of latency compared to a design with identical folding sets. For the 2-interleaved FFT-IFFT cascade, the memory and latency savings are, respectively, N/2 units and N/2 clock cycles, compared to a design with identical folding sets

arXiv.org e-Print Archive

A Gradient-Interleaved Scheduler for Energy-Efficient Backpropagation for Training Neural Networks

Author: Parhi Keshab K.
Unnikrishnan Nanda
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/02/2020
Field of study

This paper addresses design of accelerators using systolic architectures for training of neural networks using a novel gradient interleaving approach. Training the neural network involves backpropagation of error and computation of gradients with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients to the same configurable systolic array. This results in reuse of the variables from one computation to the other and eliminates unnecessary memory accesses. The proposed approach leads to 1.4 - 2.2 times savings in terms of number of cycles and

1.9 \times

savings in terms of memory accesses. Thus, the proposed accelerator reduces latency and energy consumption.Comment: Proc. 2020 IEEE International Symposium on Circuits and Systems (ISCAS

arXiv.org e-Print Archive

Tensor Decomposition for Model Reduction in Neural Networks: A Review

Author: Liu Xingyi
Parhi Keshab K.
Publication venue
Publication date: 26/04/2023
Field of study

Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP). They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation. Most state-of-the-art neural networks are over-parameterized and require a high computational cost. One straightforward solution is to replace the layers of the networks with their low-rank tensor approximations using different tensor decomposition methods. This paper reviews six tensor decomposition methods and illustrates their ability to compress model parameters of convolutional neural networks (CNNs), recurrent neural networks (RNNs) and Transformers. The accuracy of some compressed models can be higher than the original versions. Evaluations indicate that tensor decompositions can achieve significant reductions in model size, run-time and energy consumption, and are well suited for implementing neural networks on edge devices.Comment: IEEE Circuits and Systems Magazine, 202

arXiv.org e-Print Archive

On oscillatory fourth order nonlinear neutral differential equations. I.

Author: Parhi N.
Tripathy A. K.
Publication venue: Mathematical Institute of the Slovak Academy of Sciences
Publication date: 01/01/2004
Field of study

Institute of Mathematics AS CR, v. v. i.